JiabenChen / iQueryLinks
[CVPR 2023] iQuery: Instruments as Queries for Audio-Visual Sound Separation
☆68Updated last year
Alternatives and similar repositories for iQuery
Users that are interested in iQuery are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2023] AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis☆27Updated last year
- Code release for PianoMotion10M☆83Updated 3 months ago
- Hearing Anything Anywhere Code Release☆41Updated last year
- Bidirectional Mapping between Action Physical-Semantic Space☆31Updated 9 months ago
- Official code for the paper: [ICCV2023] Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation☆39Updated last year
- [ICLR 2024] Seer: Language Instructed Video Prediction with Latent Diffusion Models☆33Updated last year
- Code and datasets for 'Few-Shot Audio-Visual Learning of Environment Acoustics' (NeurIPS 2022)☆19Updated last year
- Download scripts and tools for Replay dataset.☆33Updated 2 years ago
- ☆20Updated last year
- AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis☆11Updated 8 months ago
- [ECCV2024, Oral, Best Paper Finalist] This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation…☆37Updated 4 months ago
- EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆108Updated 7 months ago
- Code and datasets for 'Move2Hear: Active Audio-Visual Source Separation' (ICCV 2021)☆15Updated 2 years ago
- ☆35Updated 2 months ago
- ☆27Updated 2 years ago
- Official implementation of EgoHOD at ICLR 2025; 14 EgoVis Challenge Winners in CVPR 2024☆18Updated 3 months ago
- ☆31Updated last year
- ☆84Updated 3 weeks ago
- Code for "Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes"☆54Updated last year
- ☆17Updated last year
- The official implementation of work "AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward".☆15Updated 3 months ago
- M3GPT: An advanced multimodal, multitask framework for motion comprehension and generation.☆15Updated 6 months ago
- [ECCV2022] D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding☆43Updated 2 years ago
- Official code for "Learning Neural Acoustic Fields" (NeurIPS 2022)☆141Updated last year
- Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces☆64Updated 3 weeks ago
- Real Acoustic Fields An Audio-Visual Room Acoustics Dataset and Benchmark☆48Updated 9 months ago
- Codebase for the paper "Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation" (ECCV2020)☆72Updated 4 years ago
- A Pytorch Implementation of Finite Scalar Quantization☆138Updated last year
- ☆17Updated 3 years ago
- A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D Dense Caption) papers and datasets.☆97Updated 2 years ago