OpenT2S / LlamaVoice
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
☆233Updated 6 months ago
Alternatives and similar repositories for LlamaVoice:
Users that are interested in LlamaVoice are comparing it to the libraries listed below
- We Speech Transcript based on LLM, in 300 lines of code.☆149Updated last week
- Collection of Open Source Speech Data☆152Updated 4 months ago
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆222Updated last week
- Real-time Speech-Text Foundation Model Toolkit (wip)☆171Updated last week
- Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3☆393Updated 6 months ago
- ☆253Updated 11 months ago
- ☆350Updated 6 months ago
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆32Updated 6 months ago
- Official implementation of the TTS model Lina-Speech☆157Updated 2 months ago
- A lightweight end-to-end text-to-speech model☆110Updated 2 weeks ago
- High-quality Text-to-Audio Generation with Efficient Diffusion Transformer☆260Updated last week
- Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice☆264Updated last month
- VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration☆153Updated last month
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆94Updated 5 months ago
- [EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers☆109Updated 3 weeks ago
- VoiceBench: Benchmarking LLM-Based Voice Assistants☆140Updated this week
- [INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark☆207Updated 8 months ago
- An unofficial PyTorch implementation of the StreamVC(Real-Time Low-Latency Voice Conversion)☆119Updated 7 months ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆154Updated 9 months ago
- Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".☆149Updated 2 months ago
- LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆448Updated 3 weeks ago
- An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement☆142Updated 2 weeks ago
- Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.☆192Updated this week
- Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models☆118Updated this week