THUDM / GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
☆2,347Updated 3 weeks ago
Alternatives and similar repositories for GLM-4-Voice:
Users that are interested in GLM-4-Voice are comparing it to the libraries listed below
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,271Updated 3 months ago
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆3,148Updated 3 weeks ago
- Multilingual Voice Understanding Model☆3,563Updated this week
- 官方推荐的 ChatTTS 资源汇总项目,整理了全网相关资源和常见问题 || Officially recommended ChatTTS resource collection project☆1,263Updated 5 months ago
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆6,565Updated last week
- ☆1,065Updated 5 months ago
- The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,497Updated 4 months ago
- GPT4V-level open-source multi-modal model based on Llama3-8B☆2,151Updated 3 months ago
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".☆829Updated last month
- EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation☆1,396Updated this week
- Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆3,371Updated last month
- ✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM☆972Updated last month
- Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。☆1,634Updated 3 weeks ago
- ☆1,164Updated last week
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆7,650Updated this week
- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model☆3,700Updated 2 months ago
- 🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.☆876Updated this week
- ☆6,870Updated last week
- Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.☆3,815Updated 3 months ago
- Agent framework and applications built upon Qwen>=2.0, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.☆3,932Updated this week
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆3,571Updated last month
- A simple screen parsing tool towards pure vision based GUI agent☆5,014Updated this week
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆2,630Updated 2 weeks ago
- EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning☆3,144Updated last week
- Inference and training library for high-quality TTS models.☆4,696Updated this week
- Real time interactive streaming digital human☆4,021Updated this week
- ☆553Updated 5 months ago
- Awesome Digital Human☆970Updated this week
- MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising☆2,478Updated 5 months ago
- ☆289Updated 4 months ago