Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
☆57Apr 20, 2023Updated 2 years ago
Alternatives and similar repositories for uavm
Users that are interested in uavm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".☆288Mar 20, 2024Updated 2 years ago
- Sapsucker Woods 60 Audiovisual Dataset☆18Oct 7, 2022Updated 3 years ago
- A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)☆58Apr 17, 2024Updated last year
- ☆43Feb 21, 2023Updated 3 years ago
- Unsupervised spoken sentence embeddings☆14Dec 14, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- A dataset for Audio-Visual Sound Event Detection in Movies☆26Jan 23, 2023Updated 3 years ago
- experiments about AudioSet☆43Jul 22, 2023Updated 2 years ago
- Code for "Phoneme Segmentation Using Self-Supervised Speech Models", Strgar & Harwath, Proceedings of the IEEE Spoken Language Technology…☆55Nov 4, 2022Updated 3 years ago
- Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations☆100Feb 20, 2026Updated last month
- DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning☆54Jan 18, 2024Updated 2 years ago
- Code for the C2KD paper (ICASSP 2023)☆19May 15, 2023Updated 2 years ago
- ☆13Nov 15, 2024Updated last year
- A curated list of audio-visual learning methods and datasets.☆286Dec 3, 2024Updated last year
- Cross-model active contrastive coding☆22Mar 17, 2021Updated 5 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing, ECCV, 2020. (Spotlight)☆90Jul 25, 2024Updated last year
- Code and dataset release for "PACS: A Dataset for Physical Audiovisual CommonSense Reasoning" (ECCV 2022)☆17Dec 20, 2022Updated 3 years ago
- Splits for epic-sounds dataset☆86Aug 2, 2025Updated 7 months ago
- ☆10Apr 17, 2024Updated last year
- Implementation of Zorro, Masked Multimodal Transformer, in Pytorch☆98Oct 20, 2023Updated 2 years ago
- Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".☆150Jul 13, 2023Updated 2 years ago
- Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)☆12Jun 1, 2023Updated 2 years ago
- ☆14Oct 7, 2021Updated 4 years ago
- Interspeech Tutorial - Resource Efficient and Cross-Modal Learning Toward Foundation Modeling☆15Oct 9, 2023Updated 2 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Word Discovery in Visually Grounded, Self-Supervised Speech Models☆26Dec 4, 2023Updated 2 years ago
- Non-Autoregressive Predictive Coding☆51Nov 3, 2020Updated 5 years ago
- ☆23Jun 24, 2024Updated last year
- Code Release for the paper "TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation" in NeurIPS…☆14Dec 9, 2021Updated 4 years ago
- AuxFormer: Robust Approach to Audiovisual Emotion Recognition☆14Mar 14, 2023Updated 3 years ago
- Code for the ICML 2025 paper "SelfCite Self-Supervised Alignment for Context Attribution in Large Language Models"☆24Mar 12, 2026Updated 2 weeks ago
- The Pytorch implementation of paper: Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training☆50Dec 17, 2024Updated last year
- VGGSound: A Large-scale Audio-Visual Dataset☆355Sep 13, 2021Updated 4 years ago
- Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"☆28Feb 22, 2022Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ICCV 2021☆34May 11, 2022Updated 3 years ago
- ☆16Sep 20, 2022Updated 3 years ago
- Audio Visual Instance Discrimination with Cross-Modal Agreement☆131Aug 13, 2021Updated 4 years ago
- MusAV: a dataset of relative arousal-valence annotations for validation of audio models☆17Dec 16, 2022Updated 3 years ago
- A python algorithm to change the pitch of the voice in real time☆13Dec 13, 2020Updated 5 years ago
- Vision Transformers are Parameter-Efficient Audio-Visual Learners☆107Aug 11, 2023Updated 2 years ago
- The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"☆480Sep 18, 2025Updated 6 months ago