Audio-JEPA is an adaptation of the Joint-Embedding Predictive Architecture (JEPA) for self-supervised audio representation learning. Built upon the I-JEPA paradigm, it uses a Vision Transformer (ViT) backbone to predict latent representations of masked spectrogram patches.
☆42Mar 19, 2026Updated this week
Alternatives and similar repositories for Audio-JEPA
Users that are interested in Audio-JEPA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Forced alignment decoder for Whisper.☆15Mar 13, 2024Updated 2 years ago
- ☆36Sep 6, 2025Updated 6 months ago
- Speech Resynthesis and Language Modeling☆27Jun 11, 2025Updated 9 months ago
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated 11 months ago
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval☆13Jun 27, 2025Updated 8 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Repo of the paper "Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model""☆15Jun 28, 2024Updated last year
- Variable Bitrate Residual Vector Quantization for Audio Coding☆50May 1, 2025Updated 10 months ago
- ☆100Jan 19, 2026Updated 2 months ago
- This repository contains the training code from paper "SpidR Learning Fast and Stable Linguistic Units for Spoken Language Models Without…☆52Mar 17, 2026Updated last week
- ☆12Mar 11, 2025Updated last year
- MIR conference deadline countdowns☆11Updated this week
- Text-To-Speech for NotebookLM