JiabenChen / iQuery
[CVPR 2023] iQuery: Instruments as Queries for Audio-Visual Sound Separation
☆64Updated last year
Alternatives and similar repositories for iQuery:
Users that are interested in iQuery are comparing it to the libraries listed below
- [NeurIPS 2023] AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis☆27Updated last year
- [ECCV2024, Oral, Best Paper Finalist]This is the official implementation of the paper "LEGO: Learning EGOcentric Action Frame Generation …☆37Updated last month
- Hearing Anything Anywhere Code Release☆37Updated 9 months ago
- Official Codebase of "Localizing Visual Sounds the Easy Way" (ECCV 2022)☆33Updated 2 years ago
- Bidirectional Mapping between Action Physical-Semantic Space☆31Updated 6 months ago
- ☆20Updated last year
- [CVPR 2023] Official implementation of our paper - Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learnin…☆24Updated last year
- [CVPR 2023] Egocentric Audio-Visual Object Localization☆24Updated last year
- EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆98Updated 4 months ago
- Official code for the paper: [ICCV2023] Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation☆37Updated last year
- ☆32Updated last year
- [ECCV2022] D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding☆43Updated 2 years ago
- [ICLR 2024] Seer: Language Instructed Video Prediction with Latent Diffusion Models☆30Updated 10 months ago
- ☆80Updated 10 months ago
- (ICCV2023) IST-Net: Prior-free Category-level Pose Estimation with Implicit Space Transformation☆111Updated last year
- ☆33Updated 5 months ago
- AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis☆10Updated 5 months ago
- ☆19Updated 7 months ago
- Download scripts and tools for Replay dataset.☆31Updated last year
- A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D Dense Caption) papers and datasets.☆97Updated 2 years ago
- [arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"☆127Updated 8 months ago
- ☆58Updated last year
- Code for the ECCV22 paper "Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds"☆86Updated last year
- [CVPR2023]Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning☆18Updated 2 years ago
- Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"☆26Updated 11 months ago
- ☆19Updated last year
- Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation☆24Updated 3 years ago
- ☆59Updated last year
- SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)☆33Updated 3 years ago
- [ECCV 2024 Oral] Audio-Synchronized Visual Animation☆47Updated 6 months ago