ictnlp / Stream-OmniLinks
Stream-Omni is a GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.
☆343Updated 3 months ago
Alternatives and similar repositories for Stream-Omni
Users that are interested in Stream-Omni are comparing it to the libraries listed below
Sorting:
- High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!☆461Updated 3 months ago
- PengChengStarling is specifically designed for developing multilingual ASR models based on the icefall project, supporting a complete ASR…☆181Updated 6 months ago
- We present StableAvatar, the first end-to-end video diffusion transformer, which synthesizes infinite-length high-quality audio-driven av…☆1,023Updated this week
- ☆121Updated 2 weeks ago
- Stay ahead of AI trends with automated Reddit insights! 🚀 This tool scans AI-related Reddit communities in English & Chinese, using Redd…☆187Updated this week
- Github repository for ACL 2025 paper: Recent Advances in Speech Language Models: A Survey.☆140Updated 3 months ago
- GPT-4o-level, real-time spoken dialogue system.☆355Updated 7 months ago
- This is the code related to "🔥Effective Training Data Synthesis for Improving MLLM Chart Understanding" (ICCV 2025).☆54Updated last month
- Doge Family of Small Language Models☆174Updated last month
- ☆458Updated 4 months ago
- PodAgent: A Comprehensive Framework for Podcast Generation☆116Updated 4 months ago
- We Speech Transcript based on LLM, in 300 lines of code.☆176Updated 3 months ago
- ☆311Updated 5 months ago
- StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding☆133Updated 4 months ago
- ☆201Updated 11 months ago
- Long-form streaming TTS system for multi-speaker dialogue generation☆434Updated last week
- ☆451Updated 4 months ago
- ☆385Updated this week
- ☆173Updated 5 months ago
- ☆68Updated last year
- 🎉 The code repository for "Parrot: Multilingual Visual Instruction Tuning" in PyTorch.☆76Updated 3 months ago
- Text-audio foundation model from Boson AI☆104Updated 2 weeks ago
- OpenS2S : Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model☆83Updated 2 months ago
- Extension of ChatTTS, 3x Faster on Windows, Support Voice Cloning and Mobile Deployment☆170Updated 7 months ago
- RealSI: Open Benchmark for Simultaneous Interpretation in Real-world Scenarios☆67Updated 2 months ago
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆209Updated 6 months ago
- VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model☆342Updated 5 months ago
- Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…☆1,080Updated this week
- llama-omni训练代码复现☆69Updated 7 months ago
- Flash Dynamic Mask Attention☆287Updated this week