MOSS-Audio-Tokenizer is a Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of diverse audio, it supports streaming and variable bitrates, delivering SOTA reconstruction and strong performance in generation and understanding—serving as a unified interface for next-generation native audio language models.
☆166Mar 6, 2026Updated 3 weeks ago
Alternatives and similar repositories for MOSS-Audio-Tokenizer
Users that are interested in MOSS-Audio-Tokenizer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Try to replicate the architecture of MiniMaxTTS mentioned in it's technical report☆48Sep 2, 2025Updated 6 months ago
- ☆68Dec 30, 2025Updated 2 months ago
- [ICASSP 2025] AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder☆13Mar 11, 2025Updated last year
- Tidy Tunes is an easy-to-use pipeline for mining high-quality audio data for speech generation models. To do so, it chains multiple open …☆23Mar 17, 2026Updated last week
- Open Source code for our paper, Steering Autoregressive Music Generation with Recursive Feature Machines (Zhao et al., 2025). aka MusicRF…☆38Oct 26, 2025Updated 5 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"☆213Sep 19, 2024Updated last year
- A TTS Trained on Universal Audio.☆41Jun 6, 2025Updated 9 months ago
- [ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching☆45Feb 9, 2025Updated last year
- LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …☆93Dec 28, 2024Updated last year
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated 11 months ago
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆348Jul 21, 2025Updated 8 months ago
- ☆60Oct 22, 2025Updated 5 months ago
- A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.☆137Sep 19, 2025Updated 6 months ago
- Official code for SongEcho☆53Mar 3, 2026Updated 3 weeks ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆38Jun 16, 2024Updated last year
- ☆100Jan 19, 2026Updated 2 months ago
- ☆25Jan 24, 2023Updated 3 years ago
- Mandarin Chinese audio datasets aligned with Montreal Forced Aligner☆17Aug 13, 2024Updated last year
- Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models☆52Sep 2, 2025Updated 6 months ago
- T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …☆28Nov 7, 2025Updated 4 months ago
- ☆50Feb 12, 2026Updated last month
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆62Nov 1, 2024Updated last year
- Self-supervised Generative LM-based Voice Conversion☆55Apr 24, 2025Updated 11 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".☆317Aug 4, 2025Updated 7 months ago
- Compute WER and SER for speech recognition evaluation☆27Mar 18, 2026Updated last week
- a guide to grapheme-to-phoneme conversion and phoneme list for ace singing voice synthesis engine☆42Jan 17, 2025Updated last year
- ☆20May 7, 2025Updated 10 months ago
- A python tool help to interact with chatgpt.☆10Dec 11, 2022Updated 3 years ago
- List of Podcast Feeds using iTunes API and script to download 6,000,000~ hours of English speech.☆31Apr 13, 2023Updated 2 years ago
- Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder☆35Aug 30, 2025Updated 6 months ago
- MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows☆130Sep 2, 2025Updated 6 months ago
- Implementation of "Look, Listen and Recognise:character-aware audio-visual subtitling"☆20Nov 3, 2025Updated 4 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting☆17Mar 4, 2025Updated last year
- ☆11Feb 20, 2025Updated last year
- Training, validation, and inference code for various SSL approaches and architectures.☆81Oct 22, 2025Updated 5 months ago
- Implementation of RIFT-SVC, a singing voice conversion model based on Rectified Flow Transformer.☆56Nov 10, 2025Updated 4 months ago
- ☆20Jun 5, 2022Updated 3 years ago
- poorman's ar-dit tts☆45Dec 31, 2025Updated 2 months ago
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…☆76Jan 25, 2026Updated 2 months ago