Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
☆146Feb 23, 2026Updated last month
Alternatives and similar repositories for m2d
Users that are interested in m2d are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- EVAR ~ Evaluation package for Audio Representations☆75Feb 19, 2026Updated last month
- Code for ICLR 2024 Paper: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models☆22Jul 10, 2024Updated last year
- A library built for easier audio self-supervised training, downstream tasks evaluation☆136Sep 25, 2025Updated 6 months ago
- Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".☆418Aug 14, 2022Updated 3 years ago
- ISMIR 24 Supplementary Material☆14Oct 28, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Dual Bayesian ResNet: A Deep Learning Approach to Heart Murmur Detection (Physionet Challenge 2022)☆23Oct 1, 2025Updated 6 months ago
- ☆123May 13, 2025Updated 10 months ago
- JEPAs for audio representation learning☆19Jun 22, 2025Updated 9 months ago
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆49Jan 19, 2026Updated 2 months ago
- Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations☆100Feb 20, 2026Updated last month
- ☆117Mar 24, 2026Updated 2 weeks ago
- This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training …☆338Nov 20, 2024Updated last year
- [IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer☆224Nov 30, 2025Updated 4 months ago
- State-of-the-art pretrained music models for training, evaluation, inference☆171Jan 20, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"☆481Sep 18, 2025Updated 6 months ago
- This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".☆162Aug 24, 2025Updated 7 months ago
- ☆33Dec 23, 2025Updated 3 months ago
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆100Jul 24, 2024Updated last year
- A standardized toolkit of Kernel Audio Distance (KAD)—a distribution-free, unbiased, and computationally efficient metric for evaluating …☆97Jun 12, 2025Updated 9 months ago
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆40Jan 27, 2025Updated last year
- Multi-lingual AudioCaps☆12Nov 20, 2023Updated 2 years ago
- ☆30Jun 22, 2022Updated 3 years ago
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"☆84Nov 7, 2025Updated 5 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- [SLT'24] The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model☆137Nov 5, 2025Updated 5 months ago
- Official Implementation of GLAP - General Language Audio Pretraining☆68Mar 25, 2026Updated 2 weeks ago
- Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification (INTERSPEECH 2023)☆73Mar 11, 2025Updated last year
- Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"☆214Sep 19, 2024Updated last year
- Official PyTorch implementation of Contrastive Learning of Musical Representations☆335Jul 25, 2024Updated last year
- Speech-MASSIVE is a multilingual Spoken Language Understanding (SLU) dataset comprising the speech counterpart for a portion of the MASSI…☆24Oct 8, 2025Updated 6 months ago
- ☆41Feb 18, 2026Updated last month
- PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models☆1,030Dec 15, 2025Updated 3 months ago
- ☆43Feb 21, 2023Updated 3 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- NeMo: a toolkit for conversational AI☆13May 4, 2024Updated last year
- small audio language model for reasoning☆85Dec 4, 2025Updated 4 months ago
- ☆14Nov 22, 2022Updated 3 years ago
- MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows☆131Sep 2, 2025Updated 7 months ago
- Heart sounds segmentation based on LSTM neural network and Fourier Synchrosqueezed Transform.☆56Feb 1, 2026Updated 2 months ago
- This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.☆257Jul 25, 2024Updated last year
- DUSTED: Spoken-Term Discovery using Discrete Speech Units☆18Oct 2, 2024Updated last year