ictnlp / Stream-OmniLinks
Stream-Omni is a GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.
☆304Updated 3 weeks ago
Alternatives and similar repositories for Stream-Omni
Users that are interested in Stream-Omni are comparing it to the libraries listed below
Sorting:
- PengChengStarling is specifically designed for developing multilingual ASR models based on the icefall project, supporting a complete ASR…☆184Updated 4 months ago
- High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!☆454Updated 3 weeks ago
- Stay ahead of AI trends with automated Reddit insights! 🚀 This tool scans AI-related Reddit communities in English & Chinese, using Redd…☆175Updated this week
- ☆125Updated 2 months ago
- GPT-4o-level, real-time spoken dialogue system.☆343Updated 5 months ago
- Github repository for ACL 2025 paper: Recent Advances in Speech Language Models: A Survey.☆104Updated 3 weeks ago
- ☆421Updated 2 months ago
- MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting…☆353Updated this week
- ☆439Updated last month
- ☆201Updated 9 months ago
- 🎉 The code repository for "Parrot: Multilingual Visual Instruction Tuning" in PyTorch.☆88Updated last month
- PodAgent: A Comprehensive Framework for Podcast Generation☆103Updated last month
- We Speech Transcript based on LLM, in 300 lines of code.☆170Updated 3 weeks ago
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆551Updated 3 weeks ago
- ☆270Updated 3 months ago
- The official repo for paper "Spatial Speech Translation: Translating Across Space With Binaural Hearables"☆64Updated 2 months ago
- StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding☆133Updated last month
- Extension of ChatTTS, 3x Faster on Windows, Support Voice Cloning and Mobile Deployment☆167Updated 5 months ago
- Doge Family of Small Language Model☆152Updated this week
- ☆232Updated 4 months ago
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆202Updated 4 months ago
- ☆67Updated 10 months ago
- LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in …☆506Updated 2 weeks ago
- ☆164Updated 5 months ago
- 基于通义千问 Qwen2.5-Omni 的实时语音对话系统,使用在线API服务,支持实时语音交互、动态语音 活动检测和流式音频处理。A real-time voice conversation system based on Qwen2.5-Omni Online-API, …☆59Updated 2 months ago
- llama-omni训练代码复现☆65Updated 5 months ago
- 🤗 R1-AQA Model: mispeech/r1-aqa☆274Updated 3 months ago
- 基于Node.js、Vue3、uniapp的ChatGPT+智能体+Midjourney绘画+PPT生成+Suno音乐+Pika/Runway/Sora视频 网页服务 | 个人、团队、企业私有化AIGC平台☆255Updated last week
- OSUM: Open Speech Understanding Model, open-sourced by ASLP@NPU.☆375Updated last week
- ☆183Updated 2 months ago