ictnlp / Stream-OmniLinks
Stream-Omni is a GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.
☆348Updated 3 months ago
Alternatives and similar repositories for Stream-Omni
Users that are interested in Stream-Omni are comparing it to the libraries listed below
Sorting:
- High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!☆464Updated 4 months ago
- PengChengStarling is specifically designed for developing multilingual ASR models based on the icefall project, supporting a complete ASR…☆181Updated 7 months ago
- We present StableAvatar, the first end-to-end video diffusion transformer, which synthesizes infinite-length high-quality audio-driven av…☆1,079Updated 3 weeks ago
- ☆124Updated last month
- GPT-4o-level, real-time spoken dialogue system.☆359Updated 8 months ago
- Github repository for ACL 2025 paper: Recent Advances in Speech Language Models: A Survey.☆140Updated 3 months ago
- MiMo-Audio: Audio Language Models are Few-Shot Learners☆760Updated 3 weeks ago
- Long-form streaming TTS system for multi-speaker dialogue generation☆795Updated this week
- ☆457Updated 5 months ago
- Doge Family of Small Language Models☆178Updated 2 months ago
- ☆203Updated last year
- ☆465Updated 4 months ago
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆638Updated 2 months ago
- MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting…☆974Updated 2 weeks ago
- StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding☆134Updated 5 months ago
- ☆319Updated 6 months ago
- PodAgent: A Comprehensive Framework for Podcast Generation☆120Updated 5 months ago
- Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…☆1,139Updated 3 weeks ago
- This is the code related to "🔥Effective Training Data Synthesis for Improving MLLM Chart Understanding" (ICCV 2025).☆65Updated 2 months ago
- Stay ahead of AI trends with automated Reddit insights! 🚀 This tool scans AI-related Reddit communities in English & Chinese, using Redd…☆720Updated this week
- ☆175Updated 8 months ago
- StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.☆1,163Updated 3 months ago
- We Speech Transcript based on LLM, in 300 lines of code.☆177Updated 3 months ago
- ☆240Updated 7 months ago
- 基于通义千问 Qwen2.5-Omni 的实时语音对话系统,使用在线API服务,支持实时语音交互、动态语音活动检测和流式音频处理。A real-time voice conversation system based on Qwen2.5-Omni Online-API, …☆74Updated 5 months ago
- Efficient audio understanding with general audio captions☆362Updated 2 weeks ago
- 🤗 R1-AQA Model: mispeech/r1-aqa☆300Updated 6 months ago
- Extension of ChatTTS, 3x Faster on Windows, Support Voice Cloning and Mobile Deployment☆170Updated 8 months ago
- A Fully Self-Hosted Solution for Full-Duplex Voice Interaction☆362Updated 2 weeks ago
- 🎉 The code repository for "Parrot: Multilingual Visual Instruction Tuning" in PyTorch.☆76Updated 4 months ago