THUDM / GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
☆2,182Updated this week
Related projects ⓘ
Alternatives and complementary repositories for GLM-4-Voice
- Multilingual Voice Understanding Model☆3,359Updated 3 weeks ago
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,202Updated 2 months ago
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆6,163Updated last week
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆3,067Updated last week
- ☆1,036Updated 4 months ago
- Real time interactive streaming digital human☆3,836Updated this week
- Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LL…☆2,011Updated last month
- GPT4V-level open-source multi-modal model based on Llama3-8B☆2,105Updated 2 months ago
- A simple screen parsing tool towards pure vision based GUI agent☆4,485Updated last week
- 🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.☆839Updated this week
- The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,478Updated 4 months ago
- EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning☆2,881Updated 2 months ago
- V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.☆2,247Updated 3 weeks ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆6,873Updated this week
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆3,499Updated last week
- MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting☆2,778Updated this week
- ✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM☆949Updated 2 weeks ago
- Awesome Digital Human☆894Updated last week
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆2,538Updated last month
- 一个超轻量级、可以在移动端实时运行的数字人模型☆822Updated last week
- Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆3,000Updated last month
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆6,867Updated this week
- 官方推荐的 ChatTTS 资源汇总项目,整理了全网相关资源和常见问题 || Officially recommended ChatTTS resource collection project☆1,202Updated 4 months ago
- ☆286Updated 3 months ago
- [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型☆5,969Updated 2 weeks ago
- Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁,一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文…☆2,547Updated this week
- MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising☆2,448Updated 4 months ago
- High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance☆1,874Updated last month
- The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling st…☆1,856Updated this week
- Inference and training library for high-quality TTS models.☆4,592Updated last week