YuanGongND/uavm

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/YuanGongND/uavm)

YuanGongND / uavm

Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".

☆57

Alternatives and similar repositories for uavm

Users that are interested in uavm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

YuanGongND / cav-mae
View on GitHub
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
☆292Mar 20, 2024Updated 2 years ago
visipedia / ssw60
View on GitHub
Sapsucker Woods 60 Audiovisual Dataset
☆19Oct 7, 2022Updated 3 years ago
roger-tseng / av-superb
View on GitHub
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
☆58Apr 17, 2024Updated 2 years ago
sony / CLIPSep
View on GitHub
☆43Feb 21, 2023Updated 3 years ago
lingjzhu / spoken_sent_embedding
View on GitHub
Unsupervised spoken sentence embeddings
☆14Dec 14, 2022Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
usc-sail / mica-subtitle-aligned-movie-sounds
View on GitHub
A dataset for Audio-Visual Sound Event Detection in Movies
☆26Jan 23, 2023Updated 3 years ago
lijuncheng16 / AudioTaggingDoneRight
View on GitHub
experiments about AudioSet
☆43Jul 22, 2023Updated 3 years ago
lstrgar / ss-phoneme-seg
View on GitHub
Code for "Phoneme Segmentation Using Self-Supervised Speech Models", Strgar & Harwath, Proceedings of the IEEE Spoken Language Technology…
☆55Nov 4, 2022Updated 3 years ago
nttcslab / msm-mae
View on GitHub
Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations
☆99Feb 20, 2026Updated 5 months ago
Alexander-H-Liu / dinosr
View on GitHub
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
☆53Jan 18, 2024Updated 2 years ago
roudimit / c2kd
View on GitHub
Code for the C2KD paper (ICASSP 2023)
☆20May 15, 2023Updated 3 years ago
3loi / MSP_Face
View on GitHub
☆13Nov 15, 2024Updated last year
GeWu-Lab / awesome-audiovisual-learning
View on GitHub
A curated list of audio-visual learning methods and datasets.
☆289Dec 3, 2024Updated last year
stoneMo / OneAVM
View on GitHub
Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)
☆12Jun 1, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
yunyikristy / CM-ACC
View on GitHub
Cross-model active contrastive coding
☆22Mar 17, 2021Updated 5 years ago
YapengTian / AVVP-ECCV20
View on GitHub
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing, ECCV, 2020. (Spotlight)
☆90Jul 25, 2024Updated 2 years ago
epic-kitchens / epic-sounds-annotations
View on GitHub
Splits for epic-sounds dataset
☆85Aug 2, 2025Updated 11 months ago
lucidrains / zorro-pytorch
View on GitHub
Implementation of Zorro, Masked Multimodal Transformer, in Pytorch
☆98Oct 20, 2023Updated 2 years ago
YuanGongND / psla
View on GitHub
Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".
☆150Jul 13, 2023Updated 3 years ago
lwang114 / GraphUnsupASR
View on GitHub
☆10Apr 17, 2024Updated 2 years ago
WangHelin1997 / MaskSpec
View on GitHub
The Pytorch implementation of paper: Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training
☆51Dec 17, 2024Updated last year
jasonppy / word-discovery
View on GitHub
Word Discovery in Visually Grounded, Self-Supervised Speech Models
☆27Dec 4, 2023Updated 2 years ago
yunyikristy / global_local
View on GitHub
☆14Oct 7, 2021Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
huckiyang / Interspeech23-Tutorial-Para-Efficient-Cross-Modal-Tutorial
View on GitHub
Interspeech Tutorial - Resource Efficient and Cross-Modal Learning Toward Foundation Modeling
☆15Oct 9, 2023Updated 2 years ago
Alexander-H-Liu / NPC
View on GitHub
Non-Autoregressive Predictive Coding
☆51Nov 3, 2020Updated 5 years ago
guotaowang / STANet
View on GitHub
☆16Sep 20, 2022Updated 3 years ago
yiren-jian / NonLing-CSE
View on GitHub
[NeurIPS 2022] Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings
☆22Jan 30, 2023Updated 3 years ago
hche11 / VGGSound
View on GitHub
VGGSound: A Large-scale Audio-Visual Dataset
☆359Sep 13, 2021Updated 4 years ago
YuanGongND / llm_speech_emotion_challenge
View on GitHub
☆23Jun 24, 2024Updated 2 years ago
ubc-vision / TriBERT
View on GitHub
Code Release for the paper "TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation" in NeurIPS…
☆14Dec 9, 2021Updated 4 years ago
brian7685 / Multimodal-Clustering-Network
View on GitHub
ICCV 2021
☆34May 11, 2022Updated 4 years ago
wnhsu / ResDAVEnet-VQ
View on GitHub
Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"
☆28Feb 22, 2022Updated 4 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
samuelyu2002 / PACS
View on GitHub
Code and dataset release for "PACS: A Dataset for Physical Audiovisual CommonSense Reasoning" (ECCV 2022)
☆18Dec 20, 2022Updated 3 years ago
facebookresearch / AVID-CMA
View on GitHub
Audio Visual Instance Discrimination with Cross-Modal Agreement
☆133Aug 13, 2021Updated 4 years ago
pprablanc / ppsrt
View on GitHub
A python algorithm to change the pitch of the voice in real time
☆13Dec 13, 2020Updated 5 years ago
GATECH-EIC / S3-Router
View on GitHub
[NeurIPS 2022] "Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Spee…
☆17Sep 19, 2023Updated 2 years ago
MTG / musav-dataset
View on GitHub
MusAV: a dataset of relative arousal-valence annotations for validation of audio models
☆17Dec 16, 2022Updated 3 years ago
RetroCirce / HTS-Audio-Transformer
View on GitHub
The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"
☆504Sep 18, 2025Updated 10 months ago
GenjiB / LAVISH
View on GitHub
Vision Transformers are Parameter-Efficient Audio-Visual Learners
☆107Aug 11, 2023Updated 2 years ago