LudovicTuncay/Audio-JEPA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/LudovicTuncay/Audio-JEPA)

LudovicTuncay / Audio-JEPA

Audio-JEPA is an adaptation of the Joint-Embedding Predictive Architecture (JEPA) for self-supervised audio representation learning. Built upon the I-JEPA paradigm, it uses a Vision Transformer (ViT) backbone to predict latent representations of masked spectrogram patches.

☆65

Alternatives and similar repositories for Audio-JEPA

Users that are interested in Audio-JEPA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SonyCSLParis / audio-representations
View on GitHub
JEPAs for audio representation learning
☆26Jun 11, 2026Updated last month
SonyResearch / VRVQ
View on GitHub
Variable Bitrate Residual Vector Quantization for Audio Coding
☆54May 1, 2025Updated last year
pkufool / simple-wer
View on GitHub
A simple command line tool to calculate WER for ASR.
☆14Oct 14, 2024Updated last year
york135 / MIRMLPop
View on GitHub
The MIR-MLPop dataset and the official implementation of the paper "MIR-MLPop: A Multilingual Pop Music Dataset with Time-Aligned Lyrics …
☆35Apr 22, 2024Updated 2 years ago
labhamlet / wavjepa
View on GitHub
This is the official codebase for WavJEPA. Time-domain audio foundation model for holistic downstream tasks. "Self-supervised learning fr…
☆34Feb 28, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Mddct / transformer-vocos
View on GitHub
☆35Sep 6, 2025Updated 10 months ago
ArenAcikgoz / Whisper-Alignment
View on GitHub
Forced alignment decoder for Whisper.
☆16Mar 13, 2024Updated 2 years ago
Audio-Foundation-Models / ConversationTTS
View on GitHub
☆101Jan 19, 2026Updated 6 months ago
KdaiP / DC-Speech-VAE
View on GitHub
5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs
☆57Nov 19, 2025Updated 8 months ago
nonverbalspeech38k / nonverspeech38k
View on GitHub
The official repository for the paper “NonVerbalSpeech-38K: A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understandi…
☆68Dec 26, 2025Updated 6 months ago
hhguo / SoCodec
View on GitHub
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
☆92Dec 20, 2024Updated last year
google-deepmind / librispeech-long
View on GitHub
LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …
☆99Dec 28, 2024Updated last year
MTG / SingWithExpressions
View on GitHub
This is the accompanying repository to the paper - Automatic Estimation of Singing Voice Musical Dynamics
☆16Oct 28, 2024Updated last year
qiuqiangkong / audioflow
View on GitHub
☆130Updated this week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
DBD-research-group / BioFoundation
View on GitHub
☆16May 7, 2026Updated 2 months ago
Mddct / usm-tokenizer
View on GitHub
semantic tokenizer for speech and music
☆20Jul 6, 2025Updated last year
llm-lab-org / CLASP
View on GitHub
CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval
☆13Jun 27, 2025Updated last year
lifeiteng / NotebookTTS
View on GitHub
Text-To-Speech for NotebookLM
☆39Jul 20, 2025Updated last year
sarulab-speech / DuplexChat
View on GitHub
☆46Jul 5, 2026Updated 2 weeks ago
habla-liaa / encodecmae
View on GitHub
Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'
☆101Jul 24, 2024Updated 2 years ago
SonyCSLParis / Stem-JEPA
View on GitHub
Joint Embedding Predictive Architecture for Musical Stem Compatibility Estimation
☆55Aug 6, 2024Updated last year
ddlBoJack / MMAR
View on GitHub
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
☆214Feb 25, 2026Updated 4 months ago
MTG / omar-rq
View on GitHub
Training, validation, and inference code for various SSL approaches and architectures.
☆87Apr 7, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
mt-upc / ZeroSwot
View on GitHub
Pushing the Limits of Zero-shot End-to-End Speech Translation
☆25Dec 12, 2024Updated last year
ASLP-lab / FlashTTS
View on GitHub
Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation
☆67Jun 16, 2026Updated last month
nttcslab / eval-audio-repr
View on GitHub
EVAR ~ Evaluation package for Audio Representations
☆81Feb 19, 2026Updated 5 months ago
aqtq314 / VogenSVS
View on GitHub
☆15Apr 16, 2026Updated 3 months ago
yukara-ikemiya / floss-torch
View on GitHub
PyTorch implementation of "Source Separation by Flow Matching (FLOSS)" by Google DeepMind
☆96Nov 24, 2025Updated 8 months ago
Pliploop / SSLISMIR
View on GitHub
☆19Sep 20, 2025Updated 10 months ago
emirdemirel / DALI-TestSet4ALT
View on GitHub
This is a subset of the DALI set consisting of 240 polyphonic recordings that is used to benchmark lyrics transcription evaluation.
☆12Nov 30, 2021Updated 4 years ago
wsntxxn / UniFlow-Audio
View on GitHub
☆73Jul 17, 2026Updated last week
bfs18 / armel
View on GitHub
poorman's ar-dit tts
☆45Dec 31, 2025Updated 6 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
ajd12342 / paraspeechcaps
View on GitHub
Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'
☆163Mar 26, 2026Updated 3 months ago
gzhu06 / Cacophony
View on GitHub
Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986
☆49Jan 19, 2026Updated 6 months ago
corticph / error-align
View on GitHub
Text-to-text alignment algorithm for speech recognition error analysis.
☆32Jun 23, 2026Updated last month
XXH333 / WordVoice-main
View on GitHub
The inference and trainging code for WordVoice.
☆61Jul 17, 2026Updated last week
barisbozkurt / MASTmelody_dataset
View on GitHub
A dataset of pitch curves for music performance assessment
☆11Jun 5, 2023Updated 3 years ago
thuhcsi / SpeechCraft
View on GitHub
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
☆197Feb 28, 2026Updated 4 months ago
lmxue / Audio-FLAN
View on GitHub
Audio-FLAN
☆161Sep 23, 2025Updated 10 months ago