AIM-SKKU / QA-TIGERView external linksLinks
Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)
☆26Jun 6, 2025Updated 8 months ago
Alternatives and similar repositories for QA-TIGER
Users that are interested in QA-TIGER are comparing it to the libraries listed below
Sorting:
- Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.☆16Oct 25, 2024Updated last year
- ☆13May 21, 2024Updated last year
- Official Pytorch Implementation for "TextToucher: Fine-Grained Text-to-Touch Generation" (AAAI 2025)☆18Jan 28, 2026Updated 2 weeks ago
- ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model☆16Jan 31, 2024Updated 2 years ago
- Official implementation of the paper How to Listen? Rethinking Visual Sound Localization☆17Apr 25, 2022Updated 3 years ago
- The official repository of "SCANet: Real-Time Face Parsing Using Spatial and Channel Attention," presented at the 2023 UR (Ubiquitous Rob…☆18Sep 15, 2023Updated 2 years ago
- This repository contains code for AAAI2025 paper "Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal …☆22Aug 18, 2025Updated 5 months ago
- Official repository of "Prompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer", AAAI 2024☆27Mar 26, 2024Updated last year
- ☆29Jul 25, 2025Updated 6 months ago
- MUSIC-AVQA, CVPR2022 (ORAL)☆95Dec 30, 2022Updated 3 years ago
- ☆18May 16, 2021Updated 4 years ago
- [ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…☆58Sep 4, 2024Updated last year
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos☆32May 27, 2025Updated 8 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆38Jan 26, 2026Updated 3 weeks ago
- Official code for the paper: [ICCV2023] Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation☆41Dec 23, 2023Updated 2 years ago
- Official code for PixMamba☆38Feb 5, 2025Updated last year
- (CVPR2023) official code of Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization☆33Sep 5, 2023Updated 2 years ago
- collection with description of super-resolution related papers, repositories, datasets, loss functions and etc.☆11Dec 12, 2023Updated 2 years ago
- Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language☆86Jun 12, 2024Updated last year
- ☆36Jul 9, 2025Updated 7 months ago
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.☆82Dec 24, 2025Updated last month
- [CVPR 2025] Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation☆19Dec 18, 2025Updated last month
- UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models☆35Dec 29, 2025Updated last month
- Multi-Agent LLM System for Digital Scam Protection☆12Dec 19, 2024Updated last year
- Implementation for "StyleGAN-Canvas: Augmenting StyleGAN3 for Real-Time Human-AI Co-Creation"☆11May 24, 2023Updated 2 years ago
- ☆13Apr 14, 2025Updated 10 months ago
- The official implementation of paper "TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models"☆16Mar 11, 2025Updated 11 months ago
- ☆10Oct 13, 2024Updated last year
- Image Search Engine with HuggingFace Sentence Transformer☆12Aug 31, 2023Updated 2 years ago
- Implementation of SoundtStream from the paper: "SoundStream: An End-to-End Neural Audio Codec"☆13Jan 27, 2025Updated last year
- Virtual character locomotion system. See article“Motion Graphs”, Lucas Kovar, 2002☆12Mar 1, 2012Updated 13 years ago
- An overview of popular reranking models and architectures for 2 stage RAG pipelines☆20Jun 10, 2025Updated 8 months ago
- ☆14Sep 17, 2024Updated last year
- ☆15Sep 14, 2025Updated 5 months ago
- Speech Security and Privacy Compendium - Mini☆10Jun 18, 2024Updated last year
- This is the official Pytorch code for our paper "Artemis: Structured Visual Reasoning for Perception Policy Learning".☆14Dec 4, 2025Updated 2 months ago
- Official Code Repository for the paper "Generating Realistic Images from In-the-wild Sounds", ICCV 2023☆12Aug 24, 2025Updated 5 months ago
- Official Repository for "Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge" (CVPR 2024)☆13Sep 1, 2024Updated last year
- [ACM MM-24] Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action Localization☆12Oct 8, 2024Updated last year