GeWu-Lab / CrabLinks

[CVPR 2025] Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation

☆75

Alternatives and similar repositories for Crab

Users that are interested in Crab are comparing it to the libraries listed below

Sorting:

yannqi / COMBO-AVS
[CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-…
☆39Updated 7 months ago
jasongief / OV-AVEL
[2025 CVPR] Towards Open-Vocabulary Audio-Visual Event Localization
☆36Updated 8 months ago
ruohaoguo / ovavss
Official Implementation of "Open-Vocabulary Audio-Visual Semantic Segmentation" [ACM MM 2024 Oral].
☆35Updated last year
schowdhury671 / meerkat
☆34Updated 4 months ago
GeWu-Lab / Generalizable-Audio-Visual-Segmentation
Official repository of "Prompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer", AAAI 2024
☆24Updated last year
GeWu-Lab / TSPM
Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.
☆17Updated last year
GeWu-Lab / Ref-AVS
The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024
☆47Updated last month
jinxiang-liu / anno-free-AVS
Official code for WACV 2024 paper, "Annotation-free Audio-Visual Segmentation"
☆35Updated last year
ttgeng233 / LongVALE
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos. (CVPR 2025))
☆52Updated 5 months ago
rikeilong / Bay-CAT
[ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…
☆57Updated last year
GenjiB / LAVISH
Vision Transformers are Parameter-Efficient Audio-Visual Learners
☆106Updated 2 years ago
ttgeng233 / UniAV
Unified Audio-Visual Perception for Multi-Task Video Localization
☆30Updated last year
ruohaoguo / avis
[CVPR 2025] 🔥 Official impl. of "Audio-Visual Instance Segmentation".
☆37Updated 5 months ago
zzhhfut / CCNet-AAAI2025
This repository contains code for AAAI2025 paper "Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal …
☆21Updated 3 months ago
Franklin905 / VALOR
Research code for NeurIPS 2023 paper "Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser"
☆18Updated 4 months ago
ttgeng233 / UnAV
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
☆69Updated last year
stoneMo / DeepAVFusion
Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".
☆35Updated last year
yunlong10 / AVicuna
[AAAI 2025] Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding
☆33Updated 7 months ago
JacobChalk / TIM
Codebase for the paper: "TIM: A Time Interval Machine for Audio-Visual Action Recognition"
☆46Updated last year
VisualAIKHU / Missing-AVQA
Official Repository for "Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality" (ECCV 2024)
☆14Updated last year
vvvb-github / AVSegFormer
[AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer
☆70Updated 8 months ago
AV-Reasoner / AV-Reasoner
☆17Updated 3 months ago
ispamm / GRAM
Official PyTorch repository for GRAM
☆103Updated 6 months ago
AlyssaYoung / AVQA
ACM MM 2022 paper_AVQA: A Dataset for Audio-Visual Question Answering on Videos
☆12Updated 2 years ago
GeWu-Lab / Stepping-Stones
The official repo for "Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation", ECCV 2024
☆17Updated last year
HarryHsing / EchoInk
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning [🔥The Exploration of R1 for General Audio-Vi…
☆63Updated 6 months ago
Lzq5 / Video-Text-Alignment
☆25Updated 4 months ago
scofield7419 / Video-of-Thought
Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
☆169Updated 8 months ago
xxayt / MGSV
[ICCV 2025] This repo is the official implementation of "Music Grounding by Short Video"
☆24Updated 2 months ago
fyyCS / LSLD
☆14Updated 2 years ago