SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.
☆873Oct 10, 2025Updated 4 months ago
Alternatives and similar repositories for SONAR
Users that are interested in SONAR are comparing it to the libraries listed below
Sorting:
- Large Concept Models: Language modeling in a sentence representation space☆2,338Jan 29, 2025Updated last year
- A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB te…☆297Updated this week
- Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications☆87Dec 20, 2024Updated last year
- This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples a…☆649Jun 9, 2024Updated last year
- An Open-source Streaming High-fidelity Neural Audio Codec☆498Mar 4, 2025Updated last year
- MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation☆402Sep 11, 2023Updated 2 years ago
- Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization☆194Jul 12, 2024Updated last year
- ☆390Sep 3, 2024Updated last year
- Variable Bitrate Residual Vector Quantization for Audio Coding☆51May 1, 2025Updated 10 months ago
- Code for BLT research paper☆2,029Nov 3, 2025Updated 4 months ago
- Unified automatic quality assessment for speech, music, and sound.☆684Jun 5, 2025Updated 9 months ago
- Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context☆216Sep 10, 2024Updated last year
- WavJourney: Compositional Audio Creation with LLMs☆540Sep 28, 2023Updated 2 years ago
- Foundational Models for State-of-the-Art Speech and Text Translation☆11,760Updated this week
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated 10 months ago
- Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".☆471Apr 24, 2024Updated last year
- This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models☆35Oct 13, 2024Updated last year
- Speech Resynthesis and Language Modeling☆27Jun 11, 2025Updated 8 months ago
- FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music gener…☆441Jan 25, 2024Updated 2 years ago
- [ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"☆368Sep 3, 2024Updated last year
- Keep track of big models in audio domain, including speech, singing, music etc.☆506Sep 26, 2024Updated last year
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆152Sep 14, 2023Updated 2 years ago
- Awesome speech/audio LLMs, representation learning, and codec models☆1,210Aug 13, 2025Updated 6 months ago
- The Open Source Code of UniAudio☆605Jul 22, 2024Updated last year
- SpeechGPT Series: Speech Large Language Models☆1,405Jul 22, 2024Updated last year
- A differentiable version of SPTK☆193Feb 26, 2026Updated last week
- Official PyTorch implementation of BigVGAN (ICLR 2023)☆1,190Sep 5, 2024Updated last year
- Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"☆212Sep 19, 2024Updated last year
- [INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark☆309Mar 31, 2025Updated 11 months ago
- AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model☆293Oct 12, 2025Updated 4 months ago
- Audio Dataset for training CLAP and other models☆732Jan 8, 2026Updated 2 months ago
- EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction☆269May 19, 2024Updated last year
- Pushing the Limits of Zero-shot End-to-End Speech Translation☆26Dec 12, 2024Updated last year
- Self-Supervised Speech Pre-training and Representation Learning Toolkit☆2,533Jun 13, 2025Updated 8 months ago
- Unsupervised Rhythm Modeling for Voice Conversion☆86Aug 3, 2023Updated 2 years ago
- SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis☆147Jan 1, 2025Updated last year
- A library for speech data augmentation in time-domain☆683Aug 30, 2021Updated 4 years ago
- ☆100Jan 19, 2026Updated last month
- LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning☆159Jun 13, 2024Updated last year