maitrix-org / Voila
☆142Updated this week
Alternatives and similar repositories for Voila:
Users that are interested in Voila are comparing it to the libraries listed below
- We Speech Transcript based on LLM, in 300 lines of code.☆160Updated 2 weeks ago
- ☆254Updated last week
- ☆221Updated last month
- A lightweight end-to-end text-to-speech model☆113Updated 2 months ago
- ☆37Updated 2 weeks ago
- GPT-4o-level, real-time spoken dialogue system.☆321Updated 3 months ago
- Kyutai with an "eye"☆189Updated last month
- SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems☆81Updated last year
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆185Updated 2 months ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆162Updated 11 months ago
- VoiceBench: Benchmarking LLM-Based Voice Assistants☆184Updated last week
- LlamaVoice is a llama-based large voice generation model, providing inference and training ability.☆232Updated 8 months ago
- ☆195Updated 7 months ago
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆264Updated last month
- CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages☆145Updated 2 months ago
- ☆158Updated 5 months ago
- A toolkit for speaker diarization.☆185Updated this week
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆34Updated 8 months ago
- ☆91Updated last week
- This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).☆73Updated 2 weeks ago
- ✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM☆314Updated 4 months ago
- An easy-to-use, fast, and easily integrable tool for evaluating audio LLM☆91Updated 3 weeks ago
- LSLM implements full duplex modeling in interactive speech language models, based on research by Ma et al. (2024). This project advances …☆66Updated 4 months ago
- PodAgent: A Comprehensive Framework for Podcast Generation☆79Updated 3 weeks ago
- F5-TTS 推理加速,速度提升约4倍!☆80Updated 4 months ago
- 🤗 R1-AQA Model: mispeech/r1-aqa☆245Updated last month
- ☆150Updated 3 months ago
- MaskGCT demo page☆14Updated 3 months ago
- flow mirror models from JZX AI Labs☆45Updated 7 months ago
- Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3☆405Updated 7 months ago