nttcslab/m2d

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/nttcslab/m2d)

nttcslab / m2d

Masked Modeling Duo: Towards a Universal Audio Pre-training Framework

☆161

Alternatives and similar repositories for m2d

Users that are interested in m2d are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

nttcslab / eval-audio-repr
View on GitHub
EVAR ~ Evaluation package for Audio Representations
☆81Feb 19, 2026Updated 5 months ago
Sreyan88 / CompA
View on GitHub
Code for ICLR 2024 Paper: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
☆23Jul 10, 2024Updated 2 years ago
Audio-WestlakeU / audiossl
View on GitHub
A library built for easier audio self-supervised training, downstream tasks evaluation
☆140Sep 25, 2025Updated 9 months ago
YuanGongND / ssast
View on GitHub
Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".
☆426Aug 14, 2022Updated 3 years ago
Benjamin-Walker / heart-murmur-detection
View on GitHub
Dual Bayesian ResNet: A Deep Learning Approach to Heart Murmur Detection (Physionet Challenge 2022)
☆23Oct 1, 2025Updated 9 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
ismir-24-sub / unsupervised_compositional_representations
View on GitHub
ISMIR 24 Supplementary Material
☆14Oct 28, 2024Updated last year
gzhu06 / Cacophony
View on GitHub
Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986
☆49Jan 19, 2026Updated 6 months ago
nttcslab / msm-mae
View on GitHub
Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations
☆99Feb 20, 2026Updated 5 months ago
SonyCSLParis / audio-representations
View on GitHub
JEPAs for audio representation learning
☆26Jun 11, 2026Updated last month
qiuqiangkong / audioflow
View on GitHub
☆128Updated this week
fschmid56 / EfficientAT
View on GitHub
This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training …
☆353Nov 20, 2024Updated last year
jimbozhang / xares
View on GitHub
A benchmark for evaluating audio encoders on various audio tasks.
☆55Apr 27, 2026Updated 2 months ago
cwx-worst-one / EAT
View on GitHub
[IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
☆237Nov 30, 2025Updated 7 months ago
Sreyan88 / ReCLAP
View on GitHub
☆33Dec 23, 2025Updated 6 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
a43992899 / MARBLE
View on GitHub
State-of-the-art pretrained music models for training, evaluation, inference
☆182Jan 20, 2026Updated 6 months ago
fschmid56 / PretrainedSED
View on GitHub
☆144May 13, 2025Updated last year
RetroCirce / HTS-Audio-Transformer
View on GitHub
The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"
☆502Sep 18, 2025Updated 10 months ago
xiaomi-research / dasheng-glap
View on GitHub
Official Implementation of GLAP - General Language Audio Pretraining
☆74May 14, 2026Updated 2 months ago
habla-liaa / encodecmae
View on GitHub
Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'
☆101Jul 24, 2024Updated last year
Audio-WestlakeU / ATST-SED
View on GitHub
This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".
☆172Jun 8, 2026Updated last month
RicherMans / Dasheng
View on GitHub
Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"
☆86Nov 7, 2025Updated 8 months ago
soham97 / mellow
View on GitHub
small audio language model for reasoning
☆88Dec 4, 2025Updated 7 months ago
kyegomez / AudioFlamingo
View on GitHub
Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…
☆39Jan 27, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
google-deepmind / slowfast_nfnets
View on GitHub
☆30Jun 22, 2022Updated 4 years ago
raymin0223 / patch-mix_contrastive_learning
View on GitHub
Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification (INTERSPEECH 2023)
☆76Mar 11, 2025Updated last year
Aria-K-Alethia / BigCodec
View on GitHub
Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"
☆218Sep 19, 2024Updated last year
hlt-mt / Speech-MASSIVE
View on GitHub
Speech-MASSIVE is a multilingual Spoken Language Understanding (SLU) dataset comprising the speech counterpart for a portion of the MASSI…
☆25Oct 8, 2025Updated 9 months ago
SiavashShams / ssamba
View on GitHub
[SLT'24] The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model
☆140Nov 5, 2025Updated 8 months ago
nttcslab / dcase2023_task2_evaluator
View on GitHub
☆12Aug 10, 2023Updated 2 years ago
Spijkervet / CLMR
View on GitHub
Official PyTorch implementation of Contrastive Learning of Musical Representations
☆338Jul 25, 2024Updated last year
sarulab-speech / ml-audiocaps
View on GitHub
Multi-lingual AudioCaps
☆14Nov 20, 2023Updated 2 years ago
fgnt / sed_scores_eval
View on GitHub
☆41Feb 18, 2026Updated 5 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
tabahi / contexless-phonemes-CUPE
View on GitHub
pytorch model for contexless-phoneme prediction from speech audio
☆32Oct 30, 2025Updated 8 months ago
chimechallenge / C8DASR-Baseline-NeMo
View on GitHub
NeMo: a toolkit for conversational AI
☆13May 4, 2024Updated 2 years ago
i-need-sleep / mad
View on GitHub
☆16Sep 29, 2025Updated 9 months ago
haoheliu / ontology-aware-audio-tagging
View on GitHub
☆14Nov 22, 2022Updated 3 years ago
bshall / dusted
View on GitHub
DUSTED: Spoken-Term Discovery using Discrete Speech Units
☆17Oct 2, 2024Updated last year
XinhaoMei / WavCaps
View on GitHub
This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
☆264Jul 25, 2024Updated last year
ilyassmoummad / scl_icbhi2017
View on GitHub
PyTorch implementation of our work: Pretraining Respiratory Sound Representations using Metadata and Contrastive Learning (WASPAA 2023)
☆33Feb 4, 2024Updated 2 years ago