The repo host the code and model of MAViL.
☆45Jul 24, 2023Updated 2 years ago
Alternatives and similar repositories for MAViL
Users that are interested in MAViL are comparing it to the libraries listed below
Sorting:
- Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".☆287Mar 20, 2024Updated last year
- A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)☆58Apr 17, 2024Updated last year
- ☆12Aug 30, 2017Updated 8 years ago
- ☆11Sep 1, 2024Updated last year
- A Pytorch (support batch and channel) implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech…☆12Jul 24, 2024Updated last year
- Scripts for download AudioSet☆86Nov 7, 2017Updated 8 years ago
- code for paper "Accessing higher dimensions for unsupervised word translation"☆22Jun 26, 2023Updated 2 years ago
- An implementation of capsule routing for sound event detection☆15Jan 29, 2019Updated 7 years ago
- ☆14Oct 7, 2021Updated 4 years ago
- Vision Transformers are Parameter-Efficient Audio-Visual Learners☆106Aug 11, 2023Updated 2 years ago
- Transfer learning and fine-tuning with YAMNet☆21Jan 20, 2026Updated last month
- [ICCV 2023] Video Background Music Generation: Dataset, Method and Evaluation☆78Mar 29, 2024Updated last year
- This repo contains the code to reproduce the paper: "Enriched Music Representations with Multiple Cross-modal Contrastive Learning"☆15Jun 22, 2023Updated 2 years ago
- The audio-visual fusion method for FFIA☆26Aug 5, 2024Updated last year
- This is the repository for TimelineQA, a benchmark for querying lifelogs.☆26Jul 5, 2023Updated 2 years ago
- Code for the paper: Audio-Visual Model Distillation Using Acoustic Images☆21Mar 24, 2023Updated 2 years ago
- A Streamlit app to add structured tags to a dataset card☆22Jun 30, 2022Updated 3 years ago
- Simple real-time Sound Event Detector based on YAMNet and pyaudio.☆24Jan 16, 2020Updated 6 years ago
- Source code for Consistent ensemble distillation for audio tagging☆57Jun 12, 2025Updated 8 months ago
- PostText is a QA system for querying your text data. When appropriate structured views are in place, PostText is good at answering querie…☆31Jun 14, 2023Updated 2 years ago
- Repository for Weak Label Learning for Audio Events - A closer look. Uses Audioset subset data provided for reproducibility.☆32Sep 13, 2023Updated 2 years ago
- Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations☆100Feb 20, 2026Updated last week
- A dataset for Audio-Visual Sound Event Detection in Movies☆26Jan 23, 2023Updated 3 years ago
- ☆29Apr 3, 2021Updated 4 years ago
- The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"☆472Sep 18, 2025Updated 5 months ago
- 📁 This repo makes it easy to download the raw audio files from AudioSet (32.45 GB, 632 classes).☆104Aug 1, 2023Updated 2 years ago
- VGGSound: A Large-scale Audio-Visual Dataset☆351Sep 13, 2021Updated 4 years ago
- ☆62Jun 15, 2025Updated 8 months ago
- [Interspeech 2024] LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2.☆34Mar 11, 2025Updated 11 months ago
- Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"☆28Feb 22, 2022Updated 4 years ago
- Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)☆858Sep 30, 2021Updated 4 years ago
- A TFLite-compatible fork of YAMNet from tensorflow/models☆31Jun 13, 2020Updated 5 years ago
- This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training …☆330Nov 20, 2024Updated last year
- Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".☆1,424May 21, 2023Updated 2 years ago
- Philo: uniting modalities☆26Mar 16, 2025Updated 11 months ago
- ☆31Sep 20, 2021Updated 4 years ago
- Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".☆39Aug 2, 2024Updated last year
- GaugeMeterView is view which can be used in different Meter applications☆12Feb 25, 2022Updated 4 years ago
- A collection of audio autoencoders, in PyTorch.☆44Mar 7, 2023Updated 2 years ago