Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)
☆26Jun 6, 2025Updated 9 months ago
Alternatives and similar repositories for QA-TIGER
Users that are interested in QA-TIGER are comparing it to the libraries listed below
Sorting:
- Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.☆16Oct 25, 2024Updated last year
- ☆13May 21, 2024Updated last year
- Official Pytorch Implementation for "TextToucher: Fine-Grained Text-to-Touch Generation" (AAAI 2025)☆19Jan 28, 2026Updated last month
- ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model☆16Jan 31, 2024Updated 2 years ago
- Official implementation of the paper How to Listen? Rethinking Visual Sound Localization☆18Apr 25, 2022Updated 3 years ago
- The official repository of "SCANet: Real-Time Face Parsing Using Spatial and Channel Attention," presented at the 2023 UR (Ubiquitous Rob…☆18Sep 15, 2023Updated 2 years ago
- This repository contains code for AAAI2025 paper "Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal …☆23Aug 18, 2025Updated 6 months ago
- Official repository of "Prompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer", AAAI 2024☆27Mar 26, 2024Updated last year
- ☆29Jul 25, 2025Updated 7 months ago
- MUSIC-AVQA, CVPR2022 (ORAL)☆96Dec 30, 2022Updated 3 years ago
- ☆18May 16, 2021Updated 4 years ago
- [ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…☆58Sep 4, 2024Updated last year
- [2025 CVPR] Towards Open-Vocabulary Audio-Visual Event Localization☆42Mar 7, 2025Updated last year
- Code for Visual Sound Localization in the Wild by Cross-Modal Interference Erasing (AAAI 2022).☆29Feb 15, 2022Updated 4 years ago
- AQUA dataset and VIKING model for the task of Art Visual Question Answering☆27Jun 4, 2021Updated 4 years ago
- [ICLR'25] Streaming Video Question-Answering with In-context Video KV-Cache Retrieval☆104Nov 4, 2025Updated 4 months ago
- A curated list of audio-visual learning methods and datasets.☆286Dec 3, 2024Updated last year
- ☆31Mar 24, 2022Updated 3 years ago
- Official code for the paper: [ICCV2023] Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation☆41Dec 23, 2023Updated 2 years ago
- Official code for PixMamba☆38Feb 5, 2025Updated last year
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆42Updated this week
- LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos. (CVPR 2025))☆56Jun 9, 2025Updated 9 months ago
- collection with description of super-resolution related papers, repositories, datasets, loss functions and etc.☆11Dec 12, 2023Updated 2 years ago
- official code for unigame☆19Nov 26, 2025Updated 3 months ago
- Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language☆86Jun 12, 2024Updated last year
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.☆84Dec 24, 2025Updated 2 months ago
- ☆36Jul 9, 2025Updated 8 months ago
- Multi-Agent LLM System for Digital Scam Protection☆12Dec 19, 2024Updated last year
- ☆10Oct 13, 2024Updated last year
- Official implementation of "Attention-aware semantic communications for collaborative inference” (IEEE IoTJ 2024)☆13Jan 22, 2026Updated last month
- [CVPR 2026] UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models☆36Feb 21, 2026Updated 2 weeks ago
- [CVPR 2025] Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation☆19Dec 18, 2025Updated 2 months ago
- ☆13Apr 14, 2025Updated 10 months ago
- Repository for the code assignment of the Deep Learning 1 course, Fall 2021 edition☆10Oct 31, 2022Updated 3 years ago
- [ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"☆10May 5, 2024Updated last year
- 为了方便大家考研☆10Sep 8, 2021Updated 4 years ago
- script to extract frames from HMDB51 dataset and create train, test and val split☆10Feb 26, 2019Updated 7 years ago
- ☆14Aug 28, 2024Updated last year
- This repo provides the codebase for "A General Framework for Weak Supervision"☆40Jun 3, 2024Updated last year