opendilab / CleanS2SLinks
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
☆496Updated last month
Alternatives and similar repositories for CleanS2S
Users that are interested in CleanS2S are comparing it to the libraries listed below
Sorting:
- Stream-Omni is a GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across various modality combinations…☆383Updated 7 months ago
- PengChengStarling is specifically designed for developing multilingual ASR models based on the icefall project, supporting a complete ASR…☆186Updated 11 months ago
- StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.☆1,238Updated 7 months ago
- ☆204Updated last year
- A Fully Self-Hosted Solution for Full-Duplex Voice Interaction☆471Updated 4 months ago
- ✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM☆362Updated 8 months ago
- 通用大模型 × 文风大模型 = 多样化风格的聊天机器人☆52Updated last year
- StyleLLM文风大模型:基于大语言模型的文本风格迁移项目。Text style transfer base on Large Language Model. #文字修饰 # 润色 #风格模仿☆350Updated last year
- Video QA Assistant based on LLMs with frame convolution☆215Updated 2 years ago
- ☆484Updated 9 months ago
- 使用vllm加速cosyvoice2的推理☆481Updated 9 months ago
- GPT-4o-level, real-time spoken dialogue system.☆369Updated last year
- Pseudo Streaming SenseVoice with Hotwords☆423Updated 10 months ago
- Graphrag的api扩展,可通过api调用,以嵌入在自己的web服务☆116Updated last year
- Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.☆873Updated last week
- 基于通义千问 Qwen2.5-Omni 的实时语音对话系统,使用在线API服务,支持实时语音交互、动态语音活动检测和流式音频处理。A real-time voice conversation system based on Qwen2.5-Omni Online-API, …☆83Updated 8 months ago
- 🤗 R1-AQA Model: mispeech/r1-aqa☆314Updated 10 months ago
- Multi-Agent-GPT: 一款基于RAG和agent构建的多模态专家助手GPT。它集成了文本、图像和音频等模态工具。支持本地部署和私有数据库建设。☆255Updated 11 months ago
- MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction mode…☆219Updated last year
- llama-omni训练代码复现☆73Updated last year
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆694Updated 2 months ago
- Pytorch Implementation of the paper "M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis"☆113Updated last month
- ☆193Updated last year
- OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.☆475Updated 2 months ago
- Ultra-low bitrate speech codec (0.27-1 kbps) with cross-modal alignment and real-time capabilities☆214Updated 5 months ago
- ☆343Updated 9 months ago
- 修复funasr中seaco-paraformer导出onnx后没有时间戳的bug☆24Updated last year
- Official code for"DiaMoE-TTS: A Unified IPA-based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptat…☆225Updated 2 months ago
- A package for parsing PDFs and analyzing their content using LLMs.☆270Updated last year
- ☆242Updated 11 months ago