Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
☆140Feb 23, 2026Updated 3 weeks ago
Alternatives and similar repositories for m2d
Users that are interested in m2d are comparing it to the libraries listed below
Sorting:
- EVAR ~ Evaluation package for Audio Representations☆75Feb 19, 2026Updated last month
- A library built for easier audio self-supervised training, downstream tasks evaluation☆136Sep 25, 2025Updated 5 months ago
- Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".☆417Aug 14, 2022Updated 3 years ago
- ISMIR 24 Supplementary Material☆14Oct 28, 2024Updated last year
- Code for ICLR 2024 Paper: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models☆22Jul 10, 2024Updated last year
- Dual Bayesian ResNet: A Deep Learning Approach to Heart Murmur Detection (Physionet Challenge 2022)☆23Oct 1, 2025Updated 5 months ago
- JEPAs for audio representation learning☆19Jun 22, 2025Updated 9 months ago
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆49Jan 19, 2026Updated 2 months ago
- ☆116May 13, 2025Updated 10 months ago
- Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations☆100Feb 20, 2026Updated last month
- This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training …☆334Nov 20, 2024Updated last year
- ☆117Feb 26, 2026Updated 3 weeks ago
- The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"☆476Sep 18, 2025Updated 6 months ago
- [IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer☆221Nov 30, 2025Updated 3 months ago
- State-of-the-art pretrained music models for training, evaluation, inference☆166Jan 20, 2026Updated 2 months ago
- A standardized toolkit of Kernel Audio Distance (KAD)—a distribution-free, unbiased, and computationally efficient metric for evaluating …☆96Jun 12, 2025Updated 9 months ago
- Multi-lingual AudioCaps☆12Nov 20, 2023Updated 2 years ago
- ☆30Jun 22, 2022Updated 3 years ago
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"☆83Nov 7, 2025Updated 4 months ago
- [SLT'24] The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model☆135Nov 5, 2025Updated 4 months ago
- Official Implementation of GLAP - General Language Audio Pretraining☆65Jan 5, 2026Updated 2 months ago
- This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".☆160Aug 24, 2025Updated 6 months ago
- Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification (INTERSPEECH 2023)☆72Mar 11, 2025Updated last year
- Official PyTorch implementation of Contrastive Learning of Musical Representations☆335Jul 25, 2024Updated last year
- Speech-MASSIVE is a multilingual Spoken Language Understanding (SLU) dataset comprising the speech counterpart for a portion of the MASSI…☆24Oct 8, 2025Updated 5 months ago
- ☆40Feb 18, 2026Updated last month
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆101Jul 24, 2024Updated last year
- ☆43Feb 21, 2023Updated 3 years ago
- NeMo: a toolkit for conversational AI☆13May 4, 2024Updated last year
- ☆14Nov 22, 2022Updated 3 years ago
- Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"☆213Sep 19, 2024Updated last year
- Heart sounds segmentation based on LSTM neural network and Fourier Synchrosqueezed Transform.☆56Feb 1, 2026Updated last month
- DUSTED: Spoken-Term Discovery using Discrete Speech Units☆18Oct 2, 2024Updated last year
- PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models☆1,019Dec 15, 2025Updated 3 months ago
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆40Jan 27, 2025Updated last year
- ☆12Aug 10, 2023Updated 2 years ago
- Unified automatic quality assessment for speech, music, and sound.☆694Jun 5, 2025Updated 9 months ago
- Open-source audio embedding models, submitted to the HEAR 2021 challenge☆11Feb 15, 2026Updated last month
- Rearrange a music recording to match a new duration - Code for "Music Rearrangement Using Hierarchical Segmentation", ICASSP 2023☆45Mar 30, 2024Updated last year