zhenye234 / X-Codec-2.0Links

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

☆349

Alternatives and similar repositories for X-Codec-2.0

Users that are interested in X-Codec-2.0 are comparing it to the libraries listed below

Sorting:

yangdongchao / RSTnet
Real-time Speech-Text Foundation Model Toolkit (wip)
☆252Updated 10 months ago
xingchensong / S3Tokenizer
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
☆503Updated last month
X-LANCE / VoiceFlow-TTS
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
☆366Updated last year
Plachtaa / FAcodec
Training code for FAcodec presented in NaturalSpeech3
☆237Updated last year
zhenye234 / xcodec
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
☆289Updated 3 months ago
zhenye234 / CoMoSpeech
ACM MM 2023 CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
☆211Updated last year
HeCheng0625 / Diffusion-Speech-Tokenizer
This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…
☆197Updated 2 weeks ago
boson-ai / EmergentTTS-Eval-public
[NeurIPS' 25] Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.
☆189Updated 2 months ago
facebookresearch / FlowDec
An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.
☆197Updated 6 months ago
KdaiP / StableTTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
☆434Updated last year
Aria-K-Alethia / BigCodec
Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"
☆211Updated last year
lifeiteng / naturalspeech3_facodec
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
☆234Updated last year
jishengpeng / WavChat
A Survey of Spoken Dialogue Models (60 pages)
☆316Updated last year
Choddeok / EmoSpherepp
[TAFFC 2025] The official implementation of EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vec…
☆118Updated 5 months ago
sh-lee-prml / PeriodWave
The official Implementation of PeriodWave and PeriodWave-Turbo
☆217Updated 9 months ago
FrontierLabs / F5R-TTS
Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"
☆147Updated 8 months ago
MatthewCYM / VoiceBench
VoiceBench: Benchmarking LLM-Based Voice Assistants
☆330Updated last week
tarepan / SpeechMOS
Easy-to-Use Speech MOS predictors
☆346Updated 2 years ago
nii-yamagishilab / ZMM-TTS
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
☆184Updated last year
adelacvg / ttts
Train the next generation of TTS systems.
☆171Updated last year
huggingface / dataspeech
☆388Updated last year
ex3ndr / supervoice-vall-e-2
VALL-E 2 reproduction
☆134Updated last year
yl4579 / PL-BERT
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
☆267Updated last year
Zain-Jiang / Speech-Editing-Toolkit
It's a repository for implementations of neural speech editing algorithms.
☆203Updated 2 years ago
Choddeok / EmoSphere-TTS
[INTERSPEECH 2024] The official implementation of EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for …
☆170Updated 8 months ago
lucidrains / spear-tts-pytorch
Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch
☆277Updated 2 years ago
haoheliu / SemantiCodec-inference
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
☆245Updated 11 months ago
tonychenxyz / emoknob
This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…
☆81Updated last year
LqNoob / Neural-Codec-and-Speech-Language-Models
Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models
☆239Updated last month
yl4579 / StyleTTS-ZS
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
☆187Updated last year