OpenT2S / LlamaVoice
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
☆232Updated 7 months ago
Alternatives and similar repositories for LlamaVoice:
Users that are interested in LlamaVoice are comparing it to the libraries listed below
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆235Updated 2 weeks ago
- Real-time Speech-Text Foundation Model Toolkit (wip)☆203Updated this week
- We Speech Transcript based on LLM, in 300 lines of code.☆149Updated 3 weeks ago
- Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3☆396Updated 6 months ago
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆163Updated 3 weeks ago
- Collection of Open Source Speech Data☆152Updated 4 months ago
- High-quality Text-to-Audio Generation with Efficient Diffusion Transformer☆261Updated 3 weeks ago
- An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement☆148Updated 2 weeks ago
- Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".☆164Updated 2 months ago
- ☆352Updated 6 months ago
- VoiceBench: Benchmarking LLM-Based Voice Assistants☆151Updated 2 weeks ago
- ☆254Updated last year
- CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages☆124Updated last month
- Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice☆270Updated 2 months ago
- This is the audio sample repository for speech separation model "MossFormer2".☆120Updated 4 months ago
- [INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark☆213Updated 9 months ago
- Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.☆162Updated 4 months ago
- Official implementation of the TTS model Lina-Speech☆157Updated 2 months ago
- Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate☆515Updated 4 months ago
- The reproduced code for Google's SoundStorm☆265Updated last year
- A lightweight end-to-end text-to-speech model☆110Updated last month
- Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models☆124Updated this week
- ☆38Updated last month
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆138Updated last year
- VALL-E 2 reproduction☆122Updated 8 months ago
- TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loud…☆93Updated 3 months ago
- Diffusion Singing Voice Conversion based on Grad-TTS from HuaWei☆141Updated last year
- ☆209Updated last week
- VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration☆156Updated this week
- An unofficial PyTorch implementation of the StreamVC(Real-Time Low-Latency Voice Conversion)☆119Updated 7 months ago