OpenMOSS / MOSS-Audio-TokenizerView external linksLinks
MOSS-Audio-Tokenizer is a Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of diverse audio, it supports streaming and variable bitrates, delivering SOTA reconstruction and strong performance in generation and understanding—serving as a unified interface for next-generation native audio language models.
☆85Updated this week
Alternatives and similar repositories for MOSS-Audio-Tokenizer
Users that are interested in MOSS-Audio-Tokenizer are comparing it to the libraries listed below
Sorting:
- Open Source code for our paper, Steering Autoregressive Music Generation with Recursive Feature Machines (Zhao et al., 2025). aka MusicRF…☆31Oct 26, 2025Updated 3 months ago
- [ICASSP 2025] AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder☆12Mar 11, 2025Updated 11 months ago
- Try to replicate the architecture of MiniMaxTTS mentioned in it's technical report☆49Sep 2, 2025Updated 5 months ago
- Tidy Tunes is an easy-to-use pipeline for mining high-quality audio data for speech generation models. To do so, it chains multiple open …☆22Feb 7, 2026Updated last week
- ☆59Oct 22, 2025Updated 3 months ago
- T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …☆28Nov 7, 2025Updated 3 months ago
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated 10 months ago
- A TTS Trained on Universal Audio.☆41Jun 6, 2025Updated 8 months ago
- A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.☆132Sep 19, 2025Updated 4 months ago
- List of Podcast Feeds using iTunes API and script to download 6,000,000~ hours of English speech.☆31Apr 13, 2023Updated 2 years ago
- ☆11Feb 20, 2025Updated 11 months ago
- ☆22Jul 30, 2025Updated 6 months ago
- A Singing Style Conversion Framework Based On Audio Infilling☆33Apr 28, 2025Updated 9 months ago
- Official implementation of paper: Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis☆50Sep 20, 2025Updated 4 months ago
- The baselines of ARC-Challenge-Interspeech2026☆56Dec 1, 2025Updated 2 months ago
- SLT 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge☆12Jun 11, 2024Updated last year
- ☆41Nov 4, 2025Updated 3 months ago
- Compute WER and SER for speech recognition evaluation☆26Dec 15, 2025Updated 2 months ago
- Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder☆31Aug 30, 2025Updated 5 months ago
- Text-To-Speech for NotebookLM☆37Jul 20, 2025Updated 6 months ago
- poorman's ar-dit tts☆45Dec 31, 2025Updated last month
- Grapheme-to-Phoneme for Mixed Chinese (Mandarin or Cantonese) and English.☆114Dec 2, 2025Updated 2 months ago
- Official implementation of the paper: "NeoBabel: A Multilingual Open Tower for Visual Generation"☆22Aug 4, 2025Updated 6 months ago
- The official repository for the paper “NonVerbalSpeech-38K: A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understandi…☆63Dec 26, 2025Updated last month
- Streamable Text-to-Speech model using a language modeling approach, without vector quantization☆110May 20, 2025Updated 8 months ago
- [ICME 2024] Official Repository for The Paper, PianoBART: Symbolic Piano Music Understanding and Generating with Large-Scale Pre-Training☆22Aug 17, 2025Updated 5 months ago
- ☆82Jan 22, 2025Updated last year
- [ICASSP 2025] FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion☆91Jul 23, 2025Updated 6 months ago
- 5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs☆57Nov 19, 2025Updated 2 months ago
- Mutiband version of HIFIGAN☆19Nov 6, 2020Updated 5 years ago
- LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …☆91Dec 28, 2024Updated last year
- Self-supervised Generative LM-based Voice Conversion☆54Apr 24, 2025Updated 9 months ago
- Variable Bitrate Residual Vector Quantization for Audio Coding☆51May 1, 2025Updated 9 months ago
- Voice conversion with just linear regression.☆33Sep 25, 2025Updated 4 months ago
- Inference code for Interspeech 2025 paper, "LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec"☆35Oct 23, 2025Updated 3 months ago
- Objective metrics used in several text-to-speech (TTS) papers.☆52Jun 17, 2025Updated 7 months ago
- A pitch detection model trained to be robust against noise and reverberation environments.☆27Jan 21, 2025Updated last year
- ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models☆34Nov 18, 2025Updated 2 months ago
- Implementation of RIFT-SVC, a singing voice conversion model based on Rectified Flow Transformer.☆55Nov 10, 2025Updated 3 months ago