MOSS-Audio-Tokenizer is a Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of diverse audio, it supports streaming and variable bitrates, delivering SOTA reconstruction and strong performance in generation and understanding—serving as a unified interface for next-generation native audio language models.
☆132Mar 6, 2026Updated this week
Alternatives and similar repositories for MOSS-Audio-Tokenizer
Users that are interested in MOSS-Audio-Tokenizer are comparing it to the libraries listed below
Sorting:
- Try to replicate the architecture of MiniMaxTTS mentioned in it's technical report☆49Sep 2, 2025Updated 6 months ago
- [ICASSP 2025] AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder☆12Mar 11, 2025Updated 11 months ago
- Open Source code for our paper, Steering Autoregressive Music Generation with Recursive Feature Machines (Zhao et al., 2025). aka MusicRF…☆36Oct 26, 2025Updated 4 months ago
- A TTS Trained on Universal Audio.☆41Jun 6, 2025Updated 9 months ago
- Official code for SongEcho☆48Updated this week
- Tidy Tunes is an easy-to-use pipeline for mining high-quality audio data for speech generation models. To do so, it chains multiple open …☆22Feb 7, 2026Updated 3 weeks ago
- A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.☆134Sep 19, 2025Updated 5 months ago
- ☆20May 7, 2025Updated 9 months ago
- ☆60Oct 22, 2025Updated 4 months ago
- LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …☆92Dec 28, 2024Updated last year
- ☆49Feb 12, 2026Updated 3 weeks ago
- MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols☆17Nov 19, 2025Updated 3 months ago
- [ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching☆45Feb 9, 2025Updated last year
- ☆99Jan 19, 2026Updated last month
- ☆25Jan 24, 2023Updated 3 years ago
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated 10 months ago
- T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …☆28Nov 7, 2025Updated 3 months ago
- A python tool help to interact with chatgpt.☆10Dec 11, 2022Updated 3 years ago
- ☆68Dec 30, 2025Updated 2 months ago
- ☆83Dec 31, 2025Updated 2 months ago
- Self-supervised Generative LM-based Voice Conversion☆54Apr 24, 2025Updated 10 months ago
- Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"☆212Sep 19, 2024Updated last year
- Training, validation, and inference code for various SSL approaches and architectures.☆79Oct 22, 2025Updated 4 months ago
- List of Podcast Feeds using iTunes API and script to download 6,000,000~ hours of English speech.☆31Apr 13, 2023Updated 2 years ago
- Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".☆315Aug 4, 2025Updated 7 months ago
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆348Jul 21, 2025Updated 7 months ago
- ☆11Feb 20, 2025Updated last year
- Mandarin Chinese audio datasets aligned with Montreal Forced Aligner☆15Aug 13, 2024Updated last year
- Aligning Agentic World Models via Knowledgeable Experience Learning☆31Jan 25, 2026Updated last month
- ☆22Jul 30, 2025Updated 7 months ago
- A Singing Style Conversion Framework Based On Audio Infilling☆33Apr 28, 2025Updated 10 months ago
- Official implementation of paper: Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis☆51Sep 20, 2025Updated 5 months ago
- The baselines of ARC-Challenge-Interspeech2026☆56Dec 1, 2025Updated 3 months ago
- Implementation of RIFT-SVC, a singing voice conversion model based on Rectified Flow Transformer.☆56Nov 10, 2025Updated 3 months ago
- ☆45Feb 25, 2026Updated last week
- ☆149Feb 25, 2026Updated last week
- Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"☆108Dec 20, 2025Updated 2 months ago
- a guide to grapheme-to-phoneme conversion and phoneme list for ace singing voice synthesis engine☆42Jan 17, 2025Updated last year
- a survey of long-context LLMs from four perspectives, architecture, infrastructure, training, and evaluation☆61Mar 31, 2025Updated 11 months ago