stoneMo/MGN

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/stoneMo/MGN)

stoneMo / MGN

Official implementation for MGN

☆20

Alternatives and similar repositories for MGN

Users that are interested in MGN are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

fyyCS / LSLD
View on GitHub
☆14Nov 13, 2023Updated 2 years ago
weiguoPian / AV-CIL_ICCV2023
View on GitHub
[ICCV 2023] Audio-Visual Class-Incremental Learning
☆35Sep 29, 2024Updated last year
Yu-Wu / Modaily-Aware-Audio-Visual-Video-Parsing
View on GitHub
Code for CVPR 2021 paper Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing
☆24Dec 29, 2021Updated 4 years ago
stoneMo / AVGN
View on GitHub
Official implementation for AVGN
☆41Mar 24, 2023Updated 3 years ago
GeWu-Lab / PSTP-Net
View on GitHub
☆17Aug 11, 2023Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
etzinis / optimal_condition_training
View on GitHub
Code and data recipes for the paper: Optimal Condition Training for Target Source Separation by Efthymios Tzinis, Gordon Wichern, Paris S…
☆14Feb 15, 2023Updated 3 years ago
DTaoo / DMC
View on GitHub
Code for Deep Multimodal Clustering for Unsupervised Audiovisual Learning (CVPR2019)
☆15May 27, 2020Updated 6 years ago
zengchang233 / CrossSinger
View on GitHub
The source code for the paper CrossSinger (asru2023)
☆18Oct 12, 2023Updated 2 years ago
huckiyang / Interspeech23-Tutorial-Para-Efficient-Cross-Modal-Tutorial
View on GitHub
Interspeech Tutorial - Resource Efficient and Cross-Modal Learning Toward Foundation Modeling
☆15Oct 9, 2023Updated 2 years ago
pedro-morgado / AVSpatialAlignment
View on GitHub
☆31Jun 14, 2022Updated 4 years ago
GeWu-Lab / MUSIC-AVQA
View on GitHub
MUSIC-AVQA, CVPR2022 (ORAL)
☆100Dec 30, 2022Updated 3 years ago
shanwangshan / TAU-urban-audio-visual-scenes
View on GitHub
☆12Oct 23, 2021Updated 4 years ago
GeWu-Lab / LFAV
View on GitHub
Towards Long Form Audio-visual Video Understanding
☆15Jan 16, 2026Updated 6 months ago
colaudiolab / DeepLearning4UTI
View on GitHub
Deep Learning For Ultrasound Tongue Imaging
☆13Dec 17, 2024Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
YapengTian / AVVP-ECCV20
View on GitHub
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing, ECCV, 2020. (Spotlight)
☆90Jul 25, 2024Updated last year
MengyuanChen21 / CVPR2023-CMPAE
View on GitHub
[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
☆37Jun 17, 2023Updated 3 years ago
LiChenda / Multi-clue-TSE-data
View on GitHub
Data simulation scripts for paper "Target Sound Extraction with Variable Cross-modality Clues"
☆17May 19, 2023Updated 3 years ago
Franklin905 / VALOR
View on GitHub
Research code for NeurIPS 2023 paper "Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser"
☆17Jul 13, 2025Updated last year
rhgao / co-separation
View on GitHub
Co-Separating Sounds of Visual Objects (ICCV 2019)
☆98Jul 25, 2023Updated 2 years ago
MCG-NJU / JoMoLD
View on GitHub
[ECCV 2022] Joint-Modal Label Denoising for Weakly-Supervised Audio-Visual Video Parsing
☆27Jul 15, 2022Updated 4 years ago
zer0int / CLIP-text-image-interpretability
View on GitHub
Get CLIP ViT text tokens about an image, visualize attention as a heatmap.
☆15Aug 8, 2023Updated 2 years ago
YapengTian / AVE-ECCV18
View on GitHub
Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018
☆210Apr 3, 2021Updated 5 years ago
GenjiB / LAVISH
View on GitHub
Vision Transformers are Parameter-Efficient Audio-Visual Learners
☆107Aug 11, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ttgeng233 / UnAV
View on GitHub
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
☆73Jan 4, 2026Updated 6 months ago
HS-YN / PanoAVQA
View on GitHub
Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)
☆16Oct 12, 2021Updated 4 years ago
channelCS / Audio-Vision
View on GitHub
Implementation and reviews of Audio & Computer vision related papers in python using keras and tensorflow.
☆40Nov 1, 2018Updated 7 years ago
JonathanDZ / TF-FaSNet
View on GitHub
☆24Feb 28, 2023Updated 3 years ago
karreny / telling-left-from-right
View on GitHub
Project website for "Telling left from right: Learning spatial correspondence between sight and sound"
☆29Jun 6, 2022Updated 4 years ago
cyh-0 / CAVP
View on GitHub
Official code for "A Closer Look at Audio-Visual Segmentation"
☆97Oct 31, 2025Updated 8 months ago
OpenNLPLab / MMVAE-AVS
View on GitHub
Multimodal Variational Auto-encoder based Audio-Visual Segmentation [ICCV2023].
☆20Sep 19, 2024Updated last year
ttgeng233 / UniAV
View on GitHub
Unified Audio-Visual Perception for Multi-Task Video Localization
☆33Apr 19, 2024Updated 2 years ago
GeWu-Lab / TSPM
View on GitHub
Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.
☆17Oct 25, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
afourast / avobjects
View on GitHub
Implementation for ECCV20 paper "Self-Supervised Learning of audio-visual objects from video"
☆114Nov 16, 2020Updated 5 years ago
schowdhury671 / meerkat
View on GitHub
☆35Jul 9, 2025Updated last year
mbzuai-oryx / ARB
View on GitHub
ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark
☆17May 25, 2025Updated last year
uark-cviu / Right2Talk
View on GitHub
[ICCV'21] The Right to Talk: An Audio-Visual Transformer Approach
☆20Aug 2, 2021Updated 4 years ago
ku-vai / TPoS
View on GitHub
This repository is for The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV2023)
☆25Dec 7, 2023Updated 2 years ago
WikiChao / Ego-AV-Loc
View on GitHub
[CVPR 2023] Egocentric Audio-Visual Object Localization
☆27Jan 6, 2024Updated 2 years ago
HimangiM / RepLAI
View on GitHub
Self-supervised algorithm for learning representations from ego-centric video data. Code is tested on EPIC-Kitchens-100 and Ego4D in PyTo…
☆13Oct 23, 2022Updated 3 years ago