Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
☆136Feb 23, 2026Updated last week
Alternatives and similar repositories for m2d
Users that are interested in m2d are comparing it to the libraries listed below
Sorting:
- EVAR ~ Evaluation package for Audio Representations☆74Feb 19, 2026Updated last week
- ☆114May 13, 2025Updated 9 months ago
- Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".☆414Aug 14, 2022Updated 3 years ago
- Code for ICLR 2024 Paper: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models☆22Jul 10, 2024Updated last year
- A library built for easier audio self-supervised training, downstream tasks evaluation☆136Sep 25, 2025Updated 5 months ago
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆48Jan 19, 2026Updated last month
- Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations☆100Feb 20, 2026Updated last week
- JEPAs for audio representation learning☆18Jun 22, 2025Updated 8 months ago
- ISMIR 24 Supplementary Material☆14Oct 28, 2024Updated last year
- This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training …☆330Nov 20, 2024Updated last year
- State-of-the-art pretrained music models for training, evaluation, inference☆163Jan 20, 2026Updated last month
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated 10 months ago
- ☆117Updated this week
- A standardized toolkit of Kernel Audio Distance (KAD)—a distribution-free, unbiased, and computationally efficient metric for evaluating …☆95Jun 12, 2025Updated 8 months ago
- Official Implementation of GLAP - General Language Audio Pretraining☆61Jan 5, 2026Updated last month
- [IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer☆221Nov 30, 2025Updated 3 months ago
- Dual Bayesian ResNet: A Deep Learning Approach to Heart Murmur Detection (Physionet Challenge 2022)☆23Oct 1, 2025Updated 5 months ago
- [SLT'24] The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model☆135Nov 5, 2025Updated 3 months ago
- Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"☆213Sep 19, 2024Updated last year
- ☆19May 9, 2019Updated 6 years ago
- DUSTED: Spoken-Term Discovery using Discrete Speech Units☆18Oct 2, 2024Updated last year
- The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"☆472Sep 18, 2025Updated 5 months ago
- This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".☆158Aug 24, 2025Updated 6 months ago
- Evaluation tool used in the BigVSAN paper☆14Mar 22, 2024Updated last year
- LoRA-based phoneme/prosody control for LLM-based TTS with no G2P - Lightweight adapter for edit and control the target language's phoneme…☆23Aug 14, 2025Updated 6 months ago
- Speech-MASSIVE is a multilingual Spoken Language Understanding (SLU) dataset comprising the speech counterpart for a portion of the MASSI…☆24Oct 8, 2025Updated 4 months ago
- Official PyTorch implementation of Contrastive Learning of Musical Representations☆335Jul 25, 2024Updated last year
- logWMSE, an audio quality metric with support for digital silence target. Useful for evaluating audio source separation systems, even whe…☆37Jun 24, 2025Updated 8 months ago
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆101Jul 24, 2024Updated last year
- small audio language model for reasoning☆86Dec 4, 2025Updated 2 months ago
- PAM is a no-reference audio quality metric for audio generation tasks☆77Jul 19, 2024Updated last year
- MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows☆124Sep 2, 2025Updated 6 months ago
- ☆40Feb 18, 2026Updated last week
- Collection of scripts from mHuBERT-147.☆32Nov 19, 2024Updated last year
- ☆33Dec 23, 2025Updated 2 months ago
- Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification (INTERSPEECH 2023)☆72Mar 11, 2025Updated 11 months ago
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆75Aug 24, 2024Updated last year
- The official implementation of DMEL the method presented in the paper "DMEL: The differentiable log-Mel spectrogram as a trainable layer …☆22Dec 21, 2024Updated last year
- Train no-reference speech quality estimators with multiple datasets via learned, per-dataset alignments.☆18Aug 1, 2025Updated 7 months ago