Proactive Hearing Assistants that Isolate Egocentric Conversations

Generative AI & LLMs
Published: arXiv: 2511.11473v1
Authors

Guilin Hu Malek Itani Tuochao Chen Shyamnath Gollakota

Abstract

We introduce proactive hearing assistants that automatically identify and separate the wearer's conversation partners, without requiring explicit prompts. Our system operates on egocentric binaural audio and uses the wearer's self-speech as an anchor, leveraging turn-taking behavior and dialogue dynamics to infer conversational partners and suppress others. To enable real-time, on-device operation, we propose a dual-model architecture: a lightweight streaming model runs every 12.5 ms for low-latency extraction of the conversation partners, while a slower model runs less frequently to capture longer-range conversational dynamics. Results on real-world 2- and 3-speaker conversation test sets, collected with binaural egocentric hardware from 11 participants totaling 6.8 hours, show generalization in identifying and isolating conversational partners in multi-conversation settings. Our work marks a step toward hearing assistants that adapt proactively to conversational dynamics and engagement. More information can be found on our website: https://proactivehearing.cs.washington.edu/

Paper Summary

Problem
Imagine you're at a noisy coffee shop and you're trying to have a conversation with a friend. But it's hard to focus on what they're saying because of all the other conversations around you. This is a common problem for people with hearing loss, who often struggle to distinguish between different voices in crowded environments. Existing hearing aids and devices can help, but they usually require manual prompts from the user, which can be impractical in multi-party conversations.
Key Innovation
Researchers have developed a new type of hearing assistant that can automatically identify and separate the wearer's conversation partners from other voices in real-time, without requiring explicit user prompts. This system uses a combination of audio processing and machine learning to analyze the wearer's self-speech and infer conversational partners based on turn-taking behavior and dialogue dynamics.
Practical Impact
This innovation has the potential to improve communication access for individuals with hearing loss, particularly in dynamic and noisy environments. By automatically adapting to conversational dynamics, the system can help users focus on the conversation they're interested in and reduce listening fatigue. This could be especially beneficial in settings like classrooms, meetings, or social gatherings.
Analogy / Intuitive Explanation
Think of this system like a personal assistant that helps you tune into a specific radio station in a crowded city. Just as a radio station can filter out static and other signals to bring you your favorite music, this hearing assistant can filter out other voices to bring you the conversation you want to hear. By using the wearer's self-speech as an anchor, the system can dynamically adjust to changes in the conversation and provide a more personalized listening experience.
Paper Information
Categories:
cs.CL cs.SD eess.AS
Published Date:

arXiv ID:

2511.11473v1

Quick Actions