guilinhu/proactive_hearing_assistant

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/guilinhu/proactive_hearing_assistant)

guilinhu / proactive_hearing_assistant

Code for the paper Proactive Hearing Assistants that Isolate Egocentric Conversations

☆46

Alternatives and similar repositories for proactive_hearing_assistant

Users that are interested in proactive_hearing_assistant are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

360CVGroup / RefTon
View on GitHub
End2End Virtual Try-on with Visual Reference, CVPR2026
☆72Apr 18, 2026Updated 3 months ago
Sreyan88 / LipGER
View on GitHub
Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
☆19Jul 16, 2024Updated 2 years ago
ysy31415 / EffectMaker
View on GitHub
Code repo for EffectMaker: Unifying Reasoning and Generation for Customized Visual Effect Creation
☆42Mar 6, 2026Updated 4 months ago
AiEson / Part-X-MLLM
View on GitHub
[ICLR 26] Part-X-MLLM: Part-aware 3D Multimodal Large Language Model
☆119Jun 17, 2026Updated last month
kszpxxzmc / ViSAudio
View on GitHub
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
☆117Dec 11, 2025Updated 7 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Visual-AI / Inpaint4Drag
View on GitHub
[ICCV 2025] Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping
☆94Nov 30, 2025Updated 7 months ago
AIGeeksGroup / UniMesh
View on GitHub
UniMesh: Unifying 3D Mesh Understanding and Generation
☆57Jul 14, 2026Updated last week
XiaokunSun / MorphAny3D
View on GitHub
[CVPR 2026] Official repo of "MorphAny3D: Unleashing the Power of Structured Latent in 3D Morphing“
☆110Apr 13, 2026Updated 3 months ago
snowflakewang / CustomX
View on GitHub
[ECCV 2026] CustomX: Unified Character, Action, and Scene Customization in Video World Models
☆96Jun 25, 2026Updated last month
JiazheWei / PosterCopilot
View on GitHub
☆198Dec 10, 2025Updated 7 months ago
mo230761 / UniGeo
View on GitHub
A framework for camera-controllable image editing using unified geometric guidance and video models.
☆65Jun 25, 2026Updated last month
yangdongchao / UniAudio2Demo
View on GitHub
☆26Feb 10, 2026Updated 5 months ago
JAMESYJL / Nano3D
View on GitHub
[ICLR 2026] NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
☆177Apr 2, 2026Updated 3 months ago
rlresearch / dr-tulu
View on GitHub
Official repository for DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
☆691Jun 17, 2026Updated last month
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
z0rc / megahal.mod
View on GitHub
MegaHAL eggdrop module with UTF-8 support
☆13Dec 7, 2015Updated 10 years ago
guyyariv / DyPE
View on GitHub
[ICML 2026] Official implementation for "DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion".
☆356May 18, 2026Updated 2 months ago
fclearner / Personal-vad-2.0
View on GitHub
Implementation of "Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition"
☆16Jun 9, 2026Updated last month
tzyll / ChineseHP
View on GitHub
Dataset for Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models in Interspeech 2024.
☆16Jul 4, 2024Updated 2 years ago
sjtuplayer / UltraGen
View on GitHub
[AAAI 2026] UltraGen
☆77Feb 1, 2026Updated 5 months ago
RCHI-Lab / voicepilot
View on GitHub
☆19May 26, 2026Updated 2 months ago
gitcommitshow / resilient-llm
View on GitHub
Resilient multi-LLM orchestration with in-built failure handing, rate limits, retries, and circuit breaker.
☆48Jun 1, 2026Updated last month
felixtaubner / mvp4d
View on GitHub
Official repository for the paper "MVP4D: Multi-View Portrait Video Diffusion for Animatable 4D Avatars"
☆43Mar 24, 2026Updated 4 months ago
Aratako / MioTTS-Inference
View on GitHub
Inference server for MioTTS, a lightweight and fast LLM-based TTS model.
☆197Feb 14, 2026Updated 5 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
AMD-AGI / Nitro-E
View on GitHub
Nitro-E is a family of text-to-image diffusion models focused on highly efficient training.
☆125Jun 4, 2026Updated last month
EditCrafter / EditCrafter
View on GitHub
The official repository of EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model (CVPRW 2026)
☆50Apr 19, 2026Updated 3 months ago
mlantz / DND-Alfred-Workflow
View on GitHub
Alfred Workflow timer
☆10Jan 14, 2019Updated 7 years ago
llm-jp / llama-mimi
View on GitHub
Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequ…
☆31Sep 20, 2025Updated 10 months ago
RanaCM / DSU-AVO
View on GitHub
Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023
☆12May 13, 2024Updated 2 years ago
LemonSky1995 / DreamStyle
View on GitHub
DreamStyle: A Unified Framework for Video Stylization
☆124Jan 7, 2026Updated 6 months ago
Mrunal-G / Casual-turn-taking-and-backchannel-prediction
View on GitHub
☆16Jun 25, 2024Updated 2 years ago
ryunuri / Elevate3D
View on GitHub
☆192Jul 31, 2025Updated 11 months ago
ErikEkstedt / conv_ssl
View on GitHub
☆14Feb 9, 2023Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
taco-group / Pulse-of-Motion
View on GitHub
The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics
☆71Mar 26, 2026Updated 4 months ago
Maddog241 / mvinverse
View on GitHub
[CVPR2026] Code Release of MVInverse: Feedforward Multi-view Inverse Rendering in Seconds
☆192Apr 1, 2026Updated 3 months ago
rikishimizu / MeanFlow-TSE
View on GitHub
☆26Jun 10, 2026Updated last month
JavisVerse / JavisGPT
View on GitHub
[NeurIPS'25 Spotlight] Official implementation of "JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation"
☆75Feb 26, 2026Updated 5 months ago
wonjune-kang / expressive-speech-retrieval
View on GitHub
Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
☆15Aug 18, 2025Updated 11 months ago
Clovermax / AED-TSVAD
View on GitHub
Attention-Based Encoder-Decoder Target-Speaker Voice Activity Detection for Robust Speaker Diarization
☆31Sep 22, 2025Updated 10 months ago
Jiang-Yidi / TS-TalkNet
View on GitHub
INTERSPEECH2023: Target Active Speaker Detection with Audio-visual Cues
☆61May 29, 2023Updated 3 years ago