An Approach Towards Action Recognition using Part Based Hierarchical Fusion

Computer Vision, Deep Learning, paper, (accepted at ISVC), 2020

Human body can be represented as an articulation of rigid and hinged joints which can be combined to form the parts of the body. Human actions can be thought of as a collective action of these parts. Hence, learning an effective spatio-temporal representation of the collective motion of these parts is key to action recognition. In this work, we propose an end-to-end pipeline for the task of human action recognition on video sequences using 2D joint trajectories estimated from a pose estimation framework. We use a Hierarchical Bidirectional Long Short Term Memory Network (HBLSTM) to model the spatio-temporal dependencies of the motion by fusing the pose based joint trajectories in a part based hierarchical fashion. Currently we are extending research on this area for dynamic scene understanding.

Reed: An approach towards quickly bootstrapping multilingual acoustic models

Speech Recognition, Deep Learning, paper, (accepted at SLT), 2021

Multilingual automatic speech recognition (ASR) systems have led to a major step forward towards building robust ASR systems for languages with low resource availability by increasing coverage for individual languages. State of the art multilingual systems are developed with sequential networks such as recurrent neural networks (RNNs) to capture long term temporal dependencies. Training and inference in such sequential models are computationally expensive, which poses a significant challenge in terms of scalability and real-time applications. In this work, an alternate architecture based on short term contextual temporal features learned on convolutional neural networks (CNNs) with a non-sequential discriminative network is proposed. Three low resource Indic languages, Gujarati, Tamil, and Telugu are used to ascertain that our proposed architecture trains 5.5× faster and reduces the inference time by a factor of 26 while maintaining comparable word error rates (WERs) against our baseline RNN.

Knowledge Graph Based Attachment Suggestions

Information Retrieval, Recommendation, (internal), 2020

In this paper we present AiGraph, an enterprise knowledge graph, representing details about how an employee communicates through emails, meetings, and documents. By representing all her communication in the form of a graph, we are able to extract complex insights which are computationally expensive in silo’ed applications. We consider a recommendation application – Meeting Insights – to show power of AiGraph. This application recommends related emails and documents for a given meeting. There are a number of ways in which AiGraph can improve the Meeting Insights – most signifcantly, it can improve the relevance of the system by providing better candidate emails; and features for a ranker to rank these candidates. In this paper we describe various ways to improve relevance of Meeting Insights using AiGraph.

Sentence Modelling for Contextual Meeting Segmentation

Natural Language Processing, Summarization, pdf, 2020

We propose a novel technique of contextual meeting segmentation for the task of meeting summarization. Unlike documents, meetings span over multiple topics spread throughout the course of the meeting. In order to capture the true summary of the meeting, it is important to capture the summary of each of the topics present in the meeting. The segmentation approaches existing today ignore the fact that sentences belonging to the same context can be continuous or non-continuous in nature. We solve the problem of contextual meeting segmentation using pointer mechanism to extract the related sentences from a meeting transcription without assuming that the sentences are consecutive in nature. Currently we are extending this approach for end-to-end contextual meeting summarization.

Past

Anterior Segment Imaging - MIT Media Lab’s Rethinking Engineering Design Execution

Anomaly detection, Eye-Care, Hardware, 2015

Eye based health care has had limited access to remote and economically challenges communities due to the expensive and bulky device called slit lamps being used for various eye examinations. In an effort against the mentioned challenges, we built a mobile low-cost wearable solid state replacement device. The device has no moving parts and is capable of capturing the anterior segment of the eye from two different angles which is later used for 3D reconstuction. An anomany detection algorithm then adds a preliminary examination on the reconstructed anterior segment of the eye to identify any abnormalities.

Full-Stack, video, 2013

The development of the portal was undertaken in order to have an one stop virtual environment that compliments the real environment interactions between students and professors. The idea was build a file-sharing network instead of just a portal. The portral had an operating system like user-interface for easy operation with an advanced search features across groups, contacts, within groups, implemented auto-sorting techniques that sorted the documents by their importance at any given point of time, discussion forum, request and push-notification features. This was extensively used by my undergraduate institution at a time having on an average of 5000 active users per month. I was awarded Best Enterpreneur by my institue for my work on thebhaad.com

SmartShuffle - This is what I wanted to hear!

Reinforcement Learning, Collaborative Recommendation, (undergraduate project), 2016

A prediction model that could predict the songs that a user would want toplay next without requiring his intervention based on the current history. The model works by detecting similarity between songs to learn a predictive model without using metadata such as sound-wave, song-name, genre etc. The idea was that similarity between two songs is quite subjective and differs heavily between individuals when the set of available songs is limited. Two songs with completely different meta properties can be perceived similar by an individual. For the prediction model, I employed an Ensemble Model of Reinforcement Learning bundled with unsupervised Learning algorithms taking the user play history as the input. The skip rate and the duration of the song played were used as reward to devise relationships.

The bot will help you shop!

Reinforcment Learning, (internship project), 2016

During my internship at Microsoft, I worked on a virtual shop assistant whose responsibility was to proactively engage the users and assist them towards the task completion. From a a set of curated questions, the agent needed to learn the most efficient order of questions to ask the users in order to maximize the engagement and win rate. I worked on a framework developed by Microsoft Research Lab based on refinforcement learning called multi world testing and implemented the policy, reward to develop and train the model.

Bipasha Sen