Research
My interests broadly lie in the fields of computer vision, machine learning and natural language processing. I recently got interested in understanding semantics from vision and languague by solving multimodal AI tasks like visual dialog, visual question answering, conversational text generation, etc., using deep learning tools.
Publications
-
Seungwhan Moon*, Satwik Kottur*, Paul A. Crook^, Ankita De^, Shivani Poddar^, Theodore Levin, David Whitney, Daniel Difranco, Ahmad Beirami, Eunjoon Cho, Rajen Subba, Alborz Geramifard
*,^ equal contribution
Situated and Interactive Multimodal Conversations
International Conference on Computational Linguistics (COLING), 2020
Oral Presentation
-
Paul Pu Liang, Jeffrey Chen, Ruslan Salakhutdinov, Louis-Philippe Morency, Satwik Kottur
On Emergent Communication in Competitive Multi-Agent Teams
International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2020
Oral Presentation
-
Satwik Kottur, José M. F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach
CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2019
Oral Presentation
-
Satwik Kottur, José M. F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach
Visual Coreference Resolution in Visual Dialog using Neural Module Networks
European Conference on Computer Vision (ECCV), 2018.
-
Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabás Póczos, Ruslan Salakhutdinov, Alex Smola
Deepsets
Oral Presentation
Conference on Neural Information Processing Systems (NIPS), 2017.
-
Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra
Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog
Best Short Paper Award
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017.
-
Abhishek Das*, Satwik Kottur*, José M. F. Moura, Stefan Lee, Dhruv Batra
* equal contribution
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
Oral Presentation
International Conference on Computer Vision (ICCV), 2017.
-
Manzil Zaheer*, Satwik Kottur*, José M. F. Moura, Amr Ahmed, Alex Smola
* equal contribution
Canopy – Fast Sampling with Cover Trees
International Conference on Machine Learning (ICML), 2017.
-
Satwik Kottur, Vitor Carvalho, Xiaoyu Wang
Exploring Personalized Neural Conversational Models
Internation Conference on Artificial Intelligence (IJCAI), 2017.
-
Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, Dhruv Batra
Visual Dialog
Spotlight, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
-
Satwik Kottur*, Ramakrishna Vedantam*, José M. F. Moura, Devi Parikh
* equal contribution
Visual Word2vec (vis-w2v): Learning Visually grounded Word Embeddings Using Abstract Scenes
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
-
Manzil Zaheer, Micheal Wick, Satwik Kottur, Jean-Baptiste Tristan
Comparing Gibbs, EM and SEM for MAP Inference in Mixture Models
OPT: NIPS Workshop on Optimization for Machine Learning, 2015.
-
Evgeny Toropov, Liangyan Gui, Shanghang Zhang, Satwik Kottur, José M. F. Moura
Traffic Flow from a Low Frame Rate City Camera
Big Data Processing and Analysis (special session) in IEEE International Conference on Image Processing (ICIP), 2015.
Other projects
-
Spoken Dialog System with Audio and Text
Topics in Deep Learning (10-807), Fall 2016
Instructor: Prof. Ruslan Salakhutdinov
Abstract:
Growth of technology places a lot of importance in human-machine interaction. With the advent of deep learning, conversational agents that engage with human through free form natural language have become popular. However, most of these attempts are purely text-based and ignore audio cues. Emotional information in human-human conversations, encoded in audio cues such as intonation and pitch, often plays an important role that is ignored the text-based approaches. In other words, the responses depend on not only what you say but also how you say. Thus, we explore generative models that jointly train on audio and text in this work. -
Stochastic Expectation Maximization for Latent Variable Models
Convex Optimization (10-725), Fall 2015
Instructor: Prof. Ryan Tibshirani
Abstract:
In this project, we want to implement and study a type of stochastic optimization. This optimization method based on expectation-maximization will be asynchronous & embarrassingly parallel and thus is useful for inference of latent variable models. The motivation for this stochastic optimization problem comes from a want to directly design a inference procedure from a “comptastical” (computational + statistical) perspective capable of leveraging modern computational resources like GPUs or cloud computing offering massive parallelism. We also find some interesting connection between stochastic expectation-maximization and stochastic gradient descent strengthening validity of proposed method.
-
Non-smooth Stochastic Optimization for MCMC
Probabilistic Graphical Models (10-708), Spring 2015
Instructor: Prof. Eric Xing
Abstract:
How do we sample efficiently from the Bayesian Lasso in a high dimensional problem with a large dataset? Hybrid Monte Carlo (HMC) has grown in popularity because it enables more efficient exploration of the state space in high-dimensional problems. Also, Stochastic Gradient-HMC has been proposed to enable application of HMC to large datasets. However, these methods apply to sampling from smooth energy functions only. We propose two ways of dealing with this: (1) SPG-HMC: Stochastic Proximal Gradient-HMC, to enable sampling from non-smooth energy functions without losing the benefits of stochasticity, and (2) Smoothing-SG-HMC. Further, we analyze its properties theoretically and empirically.
-
Movie Recommendation based on Collaborative Topic Modeling
Machine Learning (10-701), Fall 2014
Instructor: Prof. Geoff Gordon and Prof. Aarti Singh
Abstract:
Traditional collaborative filtering relies on ratings provided by viewers in the movie-watching community to make recommendations to the user. In this project, we attempt to combine this approach with probabilistic topic modeling techniques to make recommendations that consist not only of movies that are popular in the community, but also those that are similar in content to movies that the user has enjoyed in the past.
-
Detecting Text in Natural Images
Computer Vision (16-720), Fall 2014
Instructor: Prof. Martial Hebert
Abstract:
Intelligent systems often need to read text in their surroundings. There are multiple aspects that make this a challenging problem. For instance, locating and identifying the part of image containing text is in itself difficult. We study a recent approach that uses stroke width transform, and analyse the success and failure cases to get a clearer understanding.
- Static Vehicle Detection and Analysis in Aerial Imagery using Depth
Internship at IRIS, University of Southern California, Summer 2013
Guide: Prof. Gerard Medioni
Abstract:
This report proposes an approach to automatically detect static vehicles in an outdoor parking space using depth. The relevant 3D information is generated from a Digital Surface Model (DSM), which is a result of a novel and existing technique to solve camera pose estimation and dense reconstruction simultaneously. Validation using local 2D features, based on existing methods, is then done to ensure better detection rates. Further, performance of the detection system is evaluated by changing the internal parameterization of 3D model generation and the dependence is analyzed.
- Human Activity Recognition
B.Tech project-I, IIT Bombay, Fall 2013
Guide: Prof. Subhasis Chaudhuri
Abstract:
Human activity recognition is gaining importance, not only in the view of security and surveillance but also due to psychological interests in understanding the behavioral patterns of humans. This project is a study on various existing techniques that have been brought together to form a working pipeline to study human activity in social gatherings. Humans are first detected with Deformable part models and tracked as a feature point in 2.5D co-ordinate system using Lucas-Kanade algorithm. Linear cyclic pursuit model is then employed to predict short-term trajectory and understand behavior.
- Autonomous Underwater Vehicle (AUV-IITB)
AUVSI and ONR’’s International Robosub Competition, San Diego, USA
Vision (Spring 2012 - Spring 2013)
Guides: Dr. Hemendra Arya and Dr. Leena Vachhani
Details:
Designing and developing an unmanned autonomous underwater vehicle (AUV) that localizes itself and performs realistic missions based on feedback from visual, inertial, acoustic and depth sensors using thrusters and pneumatic actuators.
Matsya (sanskrit word for fish) is the AUV from IIT Bombay to participate in the International Robosub competition, San Diego which sees teams of different universities from countries all over the world.
- Parallel Simulation of Verilog HDL designs
Internship, IIT Bombay, Summer 2012
Guide: Prof. Sachin Patkar
Abstract:
Digital designs, before synthesis, are simulated on a computer platform to test their efficiency. Maximizing the performance and minimizing the overheads is, therefore, a vital area of research. The main focus of this work is to parallelize the simulation of single clock structural/behavior hardware designs without any time or resource conflict. Thus, resulting in a multi-fold in reduction in execution time. I was awarded Undergraduate Research Award (URA 01) for contribution to research at IIT Bombay.
You can find my other projects from undergraduate here.
Here is a list of all the courses I have taken, both during graduate and undergraduate studies.