I am a MS by Research student at International Institute of Information Technology, Hyderabad (IIIT-H). I am advised by Professor C V Jawahar and Professor Vinay Namboodiri at the Center for Visual Information Technology (CVIT) Lab. I am also advised by Professor K. Madhav Krishna and Professor Srinath Sridhar at the Robotics Research Center (RRC).
   Before IIIT-H, I was a Data Scientist at Microsoft, India. I led the recommendation and suggestion team for the world’s biggest enterprise facing email client - Outlook. These features are used by more than 100 million users per month, my hunch is that you might have seen some of them!
   I am also a musician, I sing and play guitar. I have toured and performed at several places with my previous band, Andrometa. I also tried my hands out travel vlogging and YouTubing! My brother is a (really awesome) piano player and has taken over the channel now - find them here!. I also love traveling. In 2018, I travelled solo to 6 countries, 13 states and interviewed 128 independent musicians!

Research Interest

My interest lies at the intersection of 3D computer vision and Robotics. Specifically, I am interested in designing improved representations of the 3D world to enable embodied agents acquire a holistic view of the world. This way, an agent can make better-informed control decisions for achieving a given downstream goal, for example, manipulation or autonomous navigation.
   Today, most works rely on explicit representation forms like pointclouds or voxel-based representations. But they are limiting in many ways - they are high dimensional, discrete, and, most importantly, incomplete – they do not sense the underlying structure and only capture explicit values at specific locations. I am more interested in implicit representations of the world and how to design improved task-specific representations. Ultimately, I am excited to see embodied AI become a part of the real world and seamlessly integrate with humans!


You can also reach out to me at bipasha dot sen at research dot iiit dot ac dot in.

Recent Updates
  • JAN '23 "SCARP: 3D Shape Completion in ARbitrary Poses for Improved Grasping" accepted at ICRA 2023!
  • JAN '23 I am attending Google Research Week 2023!
  • DEC '22 Presented INR-V at Vision India, Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP)!presented slides
  • OCT '22 "INR-V: A Continuous Representation Space for Video-based Generative Tasks" accepted at TMLR 2022!
  • SEP '22 I am grateful to Microsoft Research for awarding me a travel grant of $2000 for WACV 2023!
  • AUG '22 2 papers accepted at WACV 2023!
  • AUG '22 Gave a tutorial on "Computer Vision challenges in Table-top rearrangement and Planning" at the 6th Summer School of AI, IIIT-H!tutorial video
  • JUN '22 We are in news!
  • APR '22 We came 3rd in ICRA 2022 Open Cloud Robot Table Organization Challenge!
  • DEC '21 Granted a Provisional Patent on "SYSTEM AND METHOD FOR TRAINING USERS TO LIP READ"!
  • NOV '21 Joined Robotics Research Center at IIIT-H as a Research Fellow!
  • AUG '21 Joined MS by Research at IIIT-H!
  • JUN '21 "Personalized One-Shot Lipreading for an ALS Patient" accepted at BMVC 2021!
  • MAR '21 Joined Center for Visual Information Technology at IIIT-H as a Research Fellow!
  • SEP '20 I received "Spot Award" for "Innovation and Impact" at Microsoft!
Selected Research
SCARP: 3D Shape Completion in ARbitrary Poses for Improved Grasping
Bipasha Sen*, Aditya Agarwal*, Gaurav Singh*, Brojeshwar Bhowmick, Srinath Sridhar, Madhava Krishna, ICRA 2023,project pagevideo
We propose SCARP, a model that performs Shape Completion in ARbitrary Poses. Given a partial pointcloud of an object, SCARP learns a disentangled feature representation of pose and shape by relying on rotationally equivariant pose features and geometric shape features trained using a multi-tasking objective.
INR-V: A Continuous Representation Space for Video-based Generative Tasks
Bipasha Sen*, Aditya Agarwal*, Vinay Namboodiri, C V Jawahar, TMLR 2022,OpenReviewproject pagevideo
We propose INR-V, a video representation network that learns a continuous space for video-based generative tasks. INR-V parameterizes videos using implicit neural representations (INRs), a multi-layered perceptron that predicts an RGB value for each input pixel location of the video.
FaceOff: A Video-to-Video Face Swapping System
Aditya Agarwal*, Bipasha Sen*, Rudrabha Mukhopadhyay, Vinay Namboodiri, C V Jawahar, WACV 2023,project pagepapervideo
We introduce video-to-video (V2V) face-swapping, a novel task of face-swapping that can preserve (1) the identity and expressions of the source (actor) face video and (2) the background and pose of the target (double) video. We propose FaceOff, a V2V face-swapping system that operates by learning a robust blending operation to merge two face videos following the constraints above.
Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale
Aditya Agarwal*, Bipasha Sen*, Rudrabha Mukhopadhyay, Vinay Namboodiri, C V Jawahar, WACV 2023,paper
We propose an end-to-end automated pipeline to a lipreading training platform using state-of-the-art talking heading video generator networks, text-to-speech models, and computer vision techniques. We then perform an extensive human evaluation using carefully thought out lipreading exercises to validate the quality of our designed platform against the existing lipreading platforms.
Approaches and Challenges in Robotic Perception for Table-top Rearrangement and Planning
Aditya Agarwal*, Bipasha Sen*, Shankara Narayanan V*, Vishal Reddy Mandadi*, Brojeshwar Bhowmick, K Madhava Krishna, Arxiv 2022,papervideo
Table-top Rearrangement and Planning is a challenging problem that relies heavily on an excellent perception stack. We present a comprehensive overview and discuss the different challenges associated with the perception module. This work is a result of our extensive involvement in the ICRA 2022 OCRTOC Challenge.
Personalized One-Shot Lipreading for an ALS Patient
Bipasha Sen*, Aditya Agarwal*, Rudrabha Mukhopadhyay, Vinay Namboodiri, C V Jawahar, BMVC 2021,papervideoportal
We propose a personalized network to lipread an ALS patient using only one-shot examples. Our approach significantly improves and achieves high top-5accuracy with 83.2% accuracy compared to 62.6% achieved by comparable methods for the patient. Apart from evaluating our approach on the ALS patient, we also extend it to people with hearing impairment relying extensively on lip movements to communicate.