brian7685/Multimodal-Clustering-Network

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/brian7685/Multimodal-Clustering-Network)

brian7685 / Multimodal-Clustering-Network

ICCV 2021

☆34

Alternatives and similar repositories for Multimodal-Clustering-Network

Users that are interested in Multimodal-Clustering-Network are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

rxtan2 / video-grounding-narrations
View on GitHub
☆12Mar 12, 2023Updated 3 years ago
stoneMo / SLAVC
View on GitHub
Official Codebase of "A Closer Look at Weakly-Supervised Audio-Visual Source Localization" (NeurIPS 2022)
☆22Dec 6, 2022Updated 3 years ago
Franklin905 / VALOR
View on GitHub
Research code for NeurIPS 2023 paper "Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser"
☆17Jul 13, 2025Updated last year
DTaoo / DMC
View on GitHub
Code for Deep Multimodal Clustering for Unsupervised Audiovisual Learning (CVPR2019)
☆15May 27, 2020Updated 6 years ago
ubc-vision / TriBERT
View on GitHub
Code Release for the paper "TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation" in NeurIPS…
☆14Dec 9, 2021Updated 4 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
stoneMo / OneAVM
View on GitHub
Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)
☆12Jun 1, 2023Updated 3 years ago
fyyCS / LSLD
View on GitHub
☆14Nov 13, 2023Updated 2 years ago
DTaoo / Discriminative-Sounding-Objects-Localization
View on GitHub
Code for Discriminative Sounding Objects Localization (NeurIPS 2020)
☆61Jan 19, 2022Updated 4 years ago
GeWu-Lab / awesome-audiovisual-learning
View on GitHub
A curated list of audio-visual learning methods and datasets.
☆288Dec 3, 2024Updated last year
thuiar / TEXTOIR-DEMO
View on GitHub
TEXTOIR: An Integrated and Visualized Platform for Text Open Intent Recognition (ACL 2021)
☆55Sep 11, 2022Updated 3 years ago
YapengTian / AVVP-ECCV20
View on GitHub
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing, ECCV, 2020. (Spotlight)
☆90Jul 25, 2024Updated 2 years ago
jasongief / OV-AVEL
View on GitHub
[2025 CVPR] Towards Open-Vocabulary Audio-Visual Event Localization
☆46Mar 7, 2025Updated last year
stoneMo / AVGN
View on GitHub
Official implementation for AVGN
☆42Mar 24, 2023Updated 3 years ago
hohsiangwu / rethinking-visual-sound-localization
View on GitHub
Official implementation of the paper How to Listen? Rethinking Visual Sound Localization
☆18Apr 25, 2022Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
marmot-xy / CMBS
View on GitHub
cross modal background suppression for audio-visual event localization
☆36Mar 18, 2022Updated 4 years ago
schowdhury671 / meerkat
View on GitHub
☆35Jul 9, 2025Updated last year
YuanGongND / uavm
View on GitHub
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
☆57Apr 20, 2023Updated 3 years ago
GeWu-Lab / Valuate-and-Enhance-Multimodal-Cooperation
View on GitHub
The repo for "Enhancing Multi-modal Cooperation via Sample-level Modality Valuation", CVPR 2024
☆62Nov 5, 2024Updated last year
oncescuandreea / QuerYD_downloader
View on GitHub
☆23Dec 5, 2023Updated 2 years ago
Lzq5 / Video-Text-Alignment
View on GitHub
☆28Jul 18, 2025Updated last year
kaiw7 / STG-CMA
View on GitHub
Towards Efficient Audio-Visual Learners via Empowering Pre-trained Vision Transformers with Cross-Modal Adaptation
☆15Apr 13, 2024Updated 2 years ago
pritamqu / XKD
View on GitHub
[AAAI 2024] XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning.
☆15Jul 9, 2024Updated 2 years ago
hiyouga / AMP-Poster-Slides-LaTeX
View on GitHub
LaTeX Poster and Slides for AMP (CVPR 2021)
☆33May 31, 2021Updated 5 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
QingyangZhang / awesome-low-quality-multimodal-learning
View on GitHub
☆54Dec 30, 2024Updated last year
IMKBLE / DAMC
View on GitHub
☆10Sep 24, 2021Updated 4 years ago
thuiar / UMC
View on GitHub
Unsupervised Multimodal Clustering for Semantics Discovery in Multimodal Utterances (ACL 2024)
☆31Dec 7, 2024Updated last year
sangminwoo / ActionMAE
View on GitHub
[AAAI 2023 Oral] Official pytorch implementation of "Towards Good Practices for Missing Modality Robust Action Recognition"
☆23Dec 1, 2022Updated 3 years ago
aqibahmad / speech2face
View on GitHub
A PyTorch implementation of MIT CSAIL's Speech2Face research paper from IEEE CVPR 2019
☆12Mar 25, 2023Updated 3 years ago
fengyang0317 / STVR
View on GitHub
☆10May 30, 2019Updated 7 years ago
2han9x1a0release / RLCC
View on GitHub
☆12Jun 27, 2022Updated 4 years ago
VALUE-Leaderboard / DataRelease
View on GitHub
Data Release for VALUE Benchmark
☆30Feb 16, 2022Updated 4 years ago
qingzwang / AudioVisualCrowdCounting
View on GitHub
☆18May 13, 2022Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
GenjiB / LAVISH
View on GitHub
Vision Transformers are Parameter-Efficient Audio-Visual Learners
☆107Aug 11, 2023Updated 2 years ago
Wusiwei0410 / SciMMIR
View on GitHub
☆25Aug 1, 2024Updated last year
mugen-org / MUGEN_coinrun
View on GitHub
A repository for the updated version of CoinRun used to collect MUGEN, a multimodal video-audio-text dataset. This repo contains scripts …
☆13Jul 13, 2022Updated 4 years ago
edsonroteia / cav-mae-sync
View on GitHub
[CVPR25] Official Implementation of CAV-MAE Sync
☆31Apr 5, 2026Updated 3 months ago
WikiChao / Ego-AV-Loc
View on GitHub
[CVPR 2023] Egocentric Audio-Visual Object Localization
☆27Jan 6, 2024Updated 2 years ago
OpenNLPLab / MMVAE-AVS
View on GitHub
Multimodal Variational Auto-encoder based Audio-Visual Segmentation [ICCV2023].
☆20Sep 19, 2024Updated last year
expectorlin / DR-Attacker
View on GitHub
code for the paper "Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation" (TPAMI 2021)
☆10Jul 15, 2022Updated 4 years ago