reml-group / MUSIC-AVQA-RLinks

☆13

Alternatives and similar repositories for MUSIC-AVQA-R

Users that are interested in MUSIC-AVQA-R are comparing it to the libraries listed below

Sorting:

jasongief / OV-AVEL
[2025 CVPR] Towards Open-Vocabulary Audio-Visual Event Localization
☆31Updated 7 months ago
GeWu-Lab / TSPM
Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.
☆17Updated last year
schowdhury671 / meerkat
☆33Updated 3 months ago
AIM-SKKU / QA-TIGER
Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)
☆23Updated 4 months ago
ttgeng233 / UnAV
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
☆68Updated last year
GenjiB / LAVISH
Vision Transformers are Parameter-Efficient Audio-Visual Learners
☆103Updated 2 years ago
jinxiang-liu / anno-free-AVS
Official code for WACV 2024 paper, "Annotation-free Audio-Visual Segmentation"
☆34Updated last year
ExplainableML / AVCA-GZSL
This repository contains the code for our CVPR 2022 paper on "Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and …
☆39Updated 2 years ago
naver-ai / pcmepp
Official Pytorch implementation of "Improved Probabilistic Image-Text Representations" (ICLR 2024)
☆57Updated last year
GeWu-Lab / MUSIC-AVQA
MUSIC-AVQA, CVPR2022 (ORAL)
☆90Updated 2 years ago
Franklin905 / VALOR
Research code for NeurIPS 2023 paper "Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser"
☆18Updated 3 months ago
fyyCS / LSLD
☆14Updated last year
JacobChalk / TIM
Codebase for the paper: "TIM: A Time Interval Machine for Audio-Visual Action Recognition"
☆46Updated 11 months ago
stoneMo / CIGN
Official implementation for CIGN
☆16Updated 2 years ago
AV-Reasoner / AV-Reasoner
☆17Updated 3 months ago
Lzq5 / Video-Text-Alignment
☆25Updated 3 months ago
OpenNLPLab / MMVAE-AVS
Multimodal Variational Auto-encoder based Audio-Visual Segmentation [ICCV2023].
☆19Updated last year
weiguoPian / AV-CIL_ICCV2023
☆30Updated last year
GeWu-Lab / Generalizable-Audio-Visual-Segmentation
Official repository of "Prompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer", AAAI 2024
☆24Updated last year
rikeilong / Bay-CAT
[ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…
☆57Updated last year
YapengTian / AVVP-ECCV20
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing, ECCV, 2020. (Spotlight)
☆90Updated last year
ttgeng233 / LongVALE
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos. (CVPR 2025))
☆51Updated 4 months ago
yannqi / COMBO-AVS
[CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-…
☆39Updated 6 months ago
stoneMo / DeepAVFusion
Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".
☆34Updated last year
vvvb-github / AVSegFormer
[AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer
☆68Updated 7 months ago
ailab-kyunghee / CM2_DVC
[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval
☆61Updated last year
sangmin-git / MMSI
Code for "Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations" (CVPR 2024 Oral)
☆16Updated last year
stoneMo / MGN
Official implementation for MGN
☆20Updated 2 years ago
RERV / UniAdapter
[ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …
☆76Updated last year
zzhhfut / CCNet-AAAI2025
This repository contains code for AAAI2025 paper "Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal …
☆21Updated 2 months ago