QwenLM / Qwen2-Audio
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
☆1,230Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for Qwen2-Audio
- The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,490Updated 4 months ago
- ☆545Updated 5 months ago
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆3,092Updated 2 weeks ago
- ☆287Updated 3 months ago
- ✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM☆964Updated 3 weeks ago
- Multilingual Voice Understanding Model☆3,450Updated last month
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".☆781Updated 3 weeks ago
- GLM-4-Voice | 端到端中英语音对话模型☆2,289Updated last week
- 第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验,同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。☆537Updated last year
- Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。☆1,565Updated 2 weeks ago
- ☆1,045Updated 5 months ago
- StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.☆958Updated 2 months ago
- 🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.☆854Updated last week
- ☆173Updated last month
- Awesome speech/audio LLMs, representation learning, and codec models☆701Updated this week
- Speech, Language, Audio, Music Processing with Large Language Model☆581Updated this week
- 实时语音交互数字人,支持端到端语音方案(GLM-4-Voice - THG)和级联方案(ASR-LLM-TTS-THG)。可自定义形象与音色,无须训练,支持音色克隆,首包延迟低至3s。Real-time voice interactive digital human, su…☆413Updated this week
- An Open-Sourced LLM-empowered Foundation TTS System☆440Updated last month
- Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training wit…☆189Updated this week
- SpeechGPT Series: Speech Large Language Models☆1,293Updated 3 months ago
- 实时STT,连接OpenAI接口/智谱AI(流式LLM)和GPT-SOVITS/Edge-TTS,通过网页的方式,进行跨网络的服务调用,实现实时对话的 效果☆254Updated 4 months ago
- Fuse ChatTTS with OpenVoice, upload a 10-second audio clip, and clone your personalized ChatTTS voice.☆363Updated 2 weeks ago
- GPT4V-level open-source multi-modal model based on Llama3-8B☆2,122Updated 2 months ago
- Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training wit…☆888Updated 4 months ago
- Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆3,153Updated last month
- 官方推荐的 ChatTTS 资源汇总项目,整理了全网相关资源和常见问题 || Officially recommended ChatTTS resource collection project☆1,234Updated 4 months ago
- 🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)☆813Updated 4 months ago
- ☆878Updated 5 months ago
- SALMONN: Speech Audio Language Music Open Neural Network☆1,057Updated this week
- KAN-TTS is a speech-synthesis training framework, please try the demos we have posted at https://modelscope.cn/models?page=1&tasks=text-…☆494Updated 10 months ago