ttgeng233/UnAV

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ttgeng233/UnAV)

ttgeng233 / UnAV

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

☆73

Alternatives and similar repositories for UnAV

Users that are interested in UnAV are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ttgeng233 / UniAV
View on GitHub
Unified Audio-Visual Perception for Multi-Task Video Localization
☆33Apr 19, 2024Updated 2 years ago
GeWu-Lab / LFAV
View on GitHub
Towards Long Form Audio-visual Video Understanding
☆15Jan 16, 2026Updated 6 months ago
Franklin905 / VALOR
View on GitHub
Research code for NeurIPS 2023 paper "Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser"
☆17Jul 13, 2025Updated last year
schowdhury671 / meerkat
View on GitHub
☆35Jul 9, 2025Updated last year
ttgeng233 / LongVALE
View on GitHub
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos. (CVPR 2025))
☆61Jun 9, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
fyyCS / LSLD
View on GitHub
☆14Nov 13, 2023Updated 2 years ago
marmot-xy / CMBS
View on GitHub
cross modal background suppression for audio-visual event localization
☆36Mar 18, 2022Updated 4 years ago
MengyuanChen21 / CVPR2023-CMPAE
View on GitHub
[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
☆37Jun 17, 2023Updated 3 years ago
NIneeeeeem / LangDC
View on GitHub
[EMNLP 2025 Oral] Official codebase for Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors.
☆18Sep 7, 2025Updated 10 months ago
YapengTian / AVVP-ECCV20
View on GitHub
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing, ECCV, 2020. (Spotlight)
☆90Jul 25, 2024Updated last year
kaiw7 / STG-CMA
View on GitHub
Towards Efficient Audio-Visual Learners via Empowering Pre-trained Vision Transformers with Cross-Modal Adaptation
☆15Apr 13, 2024Updated 2 years ago
jasongief / OV-AVEL
View on GitHub
[2025 CVPR] Towards Open-Vocabulary Audio-Visual Event Localization
☆46Mar 7, 2025Updated last year
GeWu-Lab / MUSIC-AVQA
View on GitHub
MUSIC-AVQA, CVPR2022 (ORAL)
☆100Dec 30, 2022Updated 3 years ago
zjr2000 / Untrimmed-Video-Feature-Extractor
View on GitHub
A simple and effective feature extractor for untrimmed videos
☆13Sep 1, 2022Updated 3 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
zzhhfut / CCNet-AAAI2025
View on GitHub
This repository contains code for AAAI2025 paper "Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal …
☆24Aug 18, 2025Updated 11 months ago
jasongief / CPSP
View on GitHub
[2022 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line
☆32Mar 6, 2023Updated 3 years ago
lxa9867 / QSD
View on GitHub
[CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"
☆12Feb 27, 2024Updated 2 years ago
zjr2000 / LLMVA-GEBC
View on GitHub
Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)
☆29Jan 1, 2024Updated 2 years ago
JustinYuu / MM_Pyramid
View on GitHub
[ACM MM 2022] MM_Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing
☆15Aug 26, 2022Updated 3 years ago
zjr2000 / Awesome-Multimodal-Chatbot
View on GitHub
Awesome Multimodal Assistant is a curated list of multimodal chatbots/conversational assistants that utilize various modes of interaction…
☆79Jun 18, 2023Updated 3 years ago
TencentARC / FLM
View on GitHub
Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
☆31May 15, 2023Updated 3 years ago
GenjiB / LAVISH
View on GitHub
Vision Transformers are Parameter-Efficient Audio-Visual Learners
☆107Aug 11, 2023Updated 2 years ago
zihuixue / MKE
View on GitHub
[ICCV 2021] Multimodal Knowledge Expansion
☆10Aug 28, 2021Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
epic-kitchens / epic-sounds-annotations
View on GitHub
Splits for epic-sounds dataset
☆85Aug 2, 2025Updated 11 months ago
MCG-NJU / JoMoLD
View on GitHub
[ECCV 2022] Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing
☆27Jul 15, 2022Updated 4 years ago
zjr2000 / GVL
View on GitHub
Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
☆28Dec 8, 2023Updated 2 years ago
GeWu-Lab / TSPM
View on GitHub
Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.
☆17Oct 25, 2024Updated last year
GeWu-Lab / awesome-audiovisual-learning
View on GitHub
A curated list of audio-visual learning methods and datasets.
☆288Dec 3, 2024Updated last year
MengyuanChen21 / CVPR2023-OWTAL
View on GitHub
[CVPR 2023] Cascade Evidential Learning for Open-world Weakly-supervised Temporal Action Localization
☆12Jul 9, 2024Updated 2 years ago
YapengTian / AVE-ECCV18
View on GitHub
Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018
☆210Apr 3, 2021Updated 5 years ago
yunlong10 / AVicuna
View on GitHub
[AAAI 2025] Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding
☆34Mar 21, 2025Updated last year
cyh-0 / CAVP
View on GitHub
Official code for "A Closer Look at Audio-Visual Segmentation"
☆97Oct 31, 2025Updated 8 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
stoneMo / MGN
View on GitHub
Official implementation for MGN
☆20Dec 22, 2022Updated 3 years ago
haoyi-duan / DG-SCT
View on GitHub
NeurIPS'2023 official implementation code
☆70Nov 11, 2023Updated 2 years ago
OpenNLPLab / AVSBench
View on GitHub
[ECCV 2022] & [IJCV 2024] Official implementation of the paper: Audio-Visual Segmentation (with Semantics)
☆420Nov 18, 2024Updated last year
ubc-tea / FedSoup
View on GitHub
The official Pytorch implementation of paper "FedSoup: Improving Generalization and Personalization in Federated Learning via Selective M…
☆18Apr 14, 2024Updated 2 years ago
jasongief / LEAP
View on GitHub
[2024 ECCV] Label-anticipated Event Disentanglement for Audio-Visual Video Parsing
☆14Nov 17, 2024Updated last year
zjr2000 / SPES
View on GitHub
Official Implementation for paper "Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm"
☆23May 8, 2026Updated 2 months ago
JacobChalk / TIM
View on GitHub
Codebase for the paper: "TIM: A Time Interval Machine for Audio-Visual Action Recognition"
☆54Nov 7, 2024Updated last year