AlyssaYoung / AVQAView external linksLinks
ACM MM 2022 paper_AVQA: A Dataset for Audio-Visual Question Answering on Videos
☆15Aug 17, 2023Updated 2 years ago
Alternatives and similar repositories for AVQA
Users that are interested in AVQA are comparing it to the libraries listed below
Sorting:
- Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.☆16Oct 25, 2024Updated last year
- MUSIC-AVQA, CVPR2022 (ORAL)☆94Dec 30, 2022Updated 3 years ago
- ☆13May 21, 2024Updated last year
- ☆17Aug 11, 2023Updated 2 years ago
- [CVPR 2025] Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation☆81Dec 24, 2025Updated last month
- ☆21Mar 18, 2023Updated 2 years ago
- ☆36Jul 9, 2025Updated 7 months ago
- ☆15Sep 16, 2024Updated last year
- An interpreter in C for the language brainfuck.☆10Apr 12, 2023Updated 2 years ago
- [ACL 2025 Main] MMBoundary: Advancing MLLM Knowledge Boundary Awareness through Reasoning Step Confidence Calibration☆22Jun 8, 2025Updated 8 months ago
- The official source code of our AAAI25 paper "D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matchin…☆10Feb 9, 2025Updated last year
- Connected Papers knockoff, managing academic papers and citations with graph database.☆12Dec 26, 2023Updated 2 years ago
- ☆11Jun 25, 2024Updated last year
- This is the official Pytorch code for our paper "Artemis: Structured Visual Reasoning for Perception Policy Learning".☆14Dec 4, 2025Updated 2 months ago
- Code for "Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning [EMNLP 2025 Finding]"☆15Aug 27, 2025Updated 5 months ago
- [ACL 2021] This is the Pytorch code for our paper "Semantic Relation-aware Difference Representation Learning for Change Captioning".☆13Jan 16, 2022Updated 4 years ago
- posters for all CVPR2024 Award papers (Highlight and Oral)☆13Jul 9, 2024Updated last year
- A repository for using the distributed information bottleneck to locate information in data☆17Aug 26, 2024Updated last year
- [CVPR 2025] Official implementation of paper "Multi-Granularity Class Prototype Topology Distillation for Class-Incremental Source-Free …☆17Aug 26, 2025Updated 5 months ago
- This repository documents Barry's journey in learning deep learning for speech processing. Here, you'll find scripts and code snippets re…☆13Oct 8, 2025Updated 4 months ago
- Simulation code for the paper "FedSL: Federated Split Learning for Collaborative Healthcare Analytics on Resource-Constrained Wearable Io…☆16Feb 2, 2024Updated 2 years ago
- A curated list of resources in audio visual question answering and related area. :-)☆17Jun 29, 2025Updated 7 months ago
- This repository contains the Python implementation of our submitted paper titled "Deep Reinforcement Learning for Joint Trajectory and Co…☆15Jun 29, 2024Updated last year
- ☆15Mar 29, 2023Updated 2 years ago
- [CVPR2025] Code Release of Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception☆20Jun 17, 2025Updated 7 months ago
- Extract MFCCs from videos and make bag-of-audio-words (BOAW) representations.☆11Dec 20, 2018Updated 7 years ago
- [ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.☆119Oct 17, 2025Updated 3 months ago
- 封装了百度、捷通华声和讯飞语音识别的库,以及捷通华声、民族语文翻译、小牛翻译的封装。☆15Sep 10, 2019Updated 6 years ago
- AN INTERACTIVE REMOTE SENSING CHANGE ANALYSIS MODEL BASED ON MULTIMODAL INSTRUCTION TUNING☆19Jun 16, 2025Updated 7 months ago
- Quantized Generative Semantic Communication framework☆13Sep 17, 2024Updated last year
- SLT 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge☆12Jun 11, 2024Updated last year
- ☆14Nov 13, 2023Updated 2 years ago
- 16k Hz Vocoder (HiFiGAN Codes and Pretrained Models)☆18Apr 3, 2023Updated 2 years ago
- ☆15Jan 20, 2022Updated 4 years ago
- Extra notebooks for CCRMA MIR workshop, 2018 edition☆13Jun 28, 2018Updated 7 years ago
- ☆14Dec 23, 2023Updated 2 years ago
- Code for EMNLP 2022 main conference paper "Low-resource Neural Machine Translation with Cross-modal Alignment".☆14Apr 25, 2023Updated 2 years ago
- [IEEE TMM 2023] This is the Pytorch code for our paper "Neighborhood Contrastive Transformer for Change Captioning".☆12Aug 30, 2023Updated 2 years ago
- implementation of TDConvED for video captioning☆13Mar 18, 2020Updated 5 years ago