Research

My research mainly focuses on state estimation, computer vision, and human-robot interaction. I have developed algorithms for fusing multiple sensors to achieve improved scene awareness in search and rescue missions. My work on sensor networks facilitates efficient communication between the connected agents and faster convergence to consensus. More details about my research are provided below.
Ultra-wideband (UWB) SLAM with Smartphones

We proposed to incorporate camera and IMU sensors in the traditional Indoor Positioning System (IPS) to improve localization for mobile devices. The IPS system relies on ultrawideband (UWB) radar to deliver a coarse estimate of device location. However, UWB communication is often considered unreliable. We approach the shortcomings of UWB communication by taking advantage of other complementary sensors present on the device like camera and IMU, and leveraging the structure in the external environment to assess device location.


Natural Language-based Danger Assessment and SaR Planning

We proposed a danger estimation pipeline that incorporates natural language descriptions from the mission chief and image data from an on-site robot for danger assessment. Taking advantage of danger estimate, we propose a risk-aware planning framework that ensures the safety of the victim. Our simulation study demonstrates that our framework ensures high mission success rate in a search and rescue environment.


We consider the problem of finding people in an environment based on language descriptions about their appearance. We investigate a back-and-forth communication strategy where the robot asks for incremental information regarding the person, based on the current uncertainty in its prediction. By controlling the number of questions asked to the user, our approach allows the robot to regulate the duration of human-robot interaction, thus complementing the user experience.


Person Re-Identification (re-ID) with Attributes using Zero-shot Learning (ZSL)

Prior person re-identification models require the user to be able to provide a certain fixed set of visual attributes for predicting the right person. However, we argue that in many real-world scenarios, often not all these attributes are available, for example, search and rescue missions. We conducted a human-subject study to identify visual attributes of a person that humans tend to recall consistently. Subsequently, we observed a rise in re-identification accuracy by developing models that only rely on these distinct attributes.


Simultaneous Localization and Mapping (SLAM)

We studied the consequences of uncertainty in orientation knowledge encountered during the localization and mapping problem in mobile robots. To address that, we proposed a two-step approach where we first estimate the orientation, followed by optimizing for the best position value. A comparison with state-of-the-art methods revealed that our method is tolerant to high sensor noise levels.


Decentralized Pedestrian Tracking

Visual features obtained from deep learning models have shown great success in person re-identification. However, these approaches scale poorly in multi-sensor tracking problem where the number of features grow linearly with time. We propose an efficient way to compress these features into several bins by taking advantage of person’s orientation, thus, retaining crucial details of appearance. Furthermore, the compressed gallery of features can be easily shared with other connected agents, facilitating multi-robot tracking. Our evaluation on the DukeMTMC dataset show superior tracking performance of our approach.


Accelarating Consensus in Multi-agent Systems

Reaching consensus is a critical problem for a set of connected agents. We introduced a model for network evolution where agents use the weighted linear combination of current and past states, to update their current state. We proved that our model guarantees faster convergence than existing linear models and established the rate of convergence.