opendilab / CleanS2S
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
☆389Updated last month
Alternatives and similar repositories for CleanS2S:
Users that are interested in CleanS2S are comparing it to the libraries listed below
- ☆195Updated 6 months ago
- PengChengStarling is specifically designed for developing multilingual ASR models based on the icefall project, supporting a complete ASR…☆182Updated last month
- ✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM☆305Updated 3 months ago
- Graphrag的api扩展,可通过api调用,以嵌入在自己的web服务☆120Updated 4 months ago
- 🤗 R1-AQA Model: mispeech/r1-aqa☆236Updated 3 weeks ago
- Pseudo Streaming SenseVoice with Hotwords☆245Updated last month
- Multi-Agent-GPT: 一款基于RAG和agent构建的多模态专家助手GPT。它集成了文本、图像和音频等模态工具。支持本地部署和私有数据库建设。☆222Updated last month
- GPT-4o-level, real-time spoken dialogue system.☆314Updated 2 months ago
- StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.☆1,058Updated 7 months ago
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆438Updated last week
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆180Updated last month
- Video QA Assistant based on LLMs with frame convolution☆211Updated last year
- 基于Node.js、Vue3、uniapp的ChatGPT+智能体+Midjourney绘画+PPT生成+Suno音乐+Pika/Runway/Sora视频 网页服务 | 个人、团队、企业私有化AIGC平台☆245Updated last week
- 使用vllm加速cosyvoice2的推理☆215Updated last week
- ☆223Updated 2 months ago
- MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction mode…☆201Updated 3 months ago
- StyleLLM文风大模型:基于大语言模型的文本风格迁移项目。Text style transfer base on Large Language Model. #文字修饰 # 润色 #风格模仿☆293Updated 10 months ago
- We Speech Transcript based on LLM, in 300 lines of code.☆159Updated last week
- 通用大模型 × 文风大模型 = 多样化风格的聊天机器人☆44Updated 9 months ago
- Sample Repository for the AlibabaCloud Bailian Speech SDK☆159Updated 3 weeks ago
- ☆203Updated 4 months ago
- llama-omni训练代码复现☆59Updated 2 months ago
- OSUM: Open Speech Understanding Model, open-sourced by ASLP@NPU.☆357Updated this week
- 身份证OCR智能识别、证件提取以及验证码自动化解析功能,项目核心基于深度学习技术。从数据采集、数据标注 、模型训练、模型度量 、模型服务部署 全流程欢迎讨论。所有自训练模型、finetune欢迎自取使用,并持续关注我输出的更多模型 。V:chenganp☆159Updated 4 months ago
- ☆681Updated 10 months ago
- 本项目开源基于NextJS的前端, 希望能够提供一个用于生成式AI的文字转视频, 尤其是电影从编剧到视频生成的Web前端平台参考。Everyone can become a director. The Nextjs front-end of an AI driven pla…☆192Updated last year
- Streaming ASR and TTS based on FastAPI+ sherpa-onnx☆95Updated 6 months ago
- A Template Based Report Rendering Platform.☆326Updated 9 months ago
- An easy-to-use, fast, and easily integrable tool for evaluating audio LLM☆84Updated this week
- 基于通义千问 Qwen2.5-Omni 的实时语音对话系统,使用在线API服务,支持实时语音交互、动态语音活动检测和流式音频处理。A real-time voice conversation system based on Qwen2.5-Omni Online-API, …☆36Updated last week