StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
☆1,240Jun 29, 2025Updated 8 months ago
Alternatives and similar repositories for StreamSpeech
Users that are interested in StreamSpeech are comparing it to the libraries listed below
Sorting:
- A fast speech-to-speech & speech-to-text translation model that supports simultaneous decoding and offers 28× speedup.☆77Oct 22, 2024Updated last year
- gocrawler, go分布式爬虫框架☆116Jun 4, 2024Updated last year
- Flutter的Gromore广告插件☆123Apr 17, 2024Updated last year
- Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".☆25Jul 2, 2024Updated last year
- X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion☆111Apr 1, 2024Updated last year
- [ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"☆367Sep 3, 2024Updated last year
- OCR训练样本生成器,自动生成用于训练OCR检测和识别模型的图片样本和标注☆136Aug 27, 2024Updated last year
- Controllable and fast Text-to-Speech for over 7000 languages!☆2,188Jan 25, 2026Updated last month
- Multilingual Voice Understanding Model☆7,611Dec 30, 2025Updated 2 months ago
- Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3☆434Sep 13, 2024Updated last year
- [ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching☆1,250Feb 23, 2026Updated last week
- 使用go实现单机式/分布式限流方案☆114Mar 18, 2024Updated last year
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆2,057Apr 21, 2025Updated 10 months ago
- An Open-source Streaming High-fidelity Neural Audio Codec☆498Mar 4, 2025Updated 11 months ago
- A Framework for Speech, Language, Audio, Music Processing with Large Language Model☆974Jan 15, 2026Updated last month
- LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning☆159Jun 13, 2024Updated last year
- Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".☆63Jul 22, 2024Updated last year
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency☆59Oct 23, 2024Updated last year
- ☆54Jun 15, 2024Updated last year
- A Survey of Spoken Dialogue Models (60 pages)☆315Nov 28, 2024Updated last year
- 高并发实战-RabbitMQ消息队列入门指南☆43Jun 15, 2024Updated last year
- Snake game with rust☆35May 29, 2024Updated last year
- Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (…☆61Apr 4, 2024Updated last year
- This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples a…☆647Jun 9, 2024Updated last year
- [ACL 2025 Main] ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec☆274Nov 22, 2024Updated last year
- The open source code for SimpleSpeech series☆145Oct 8, 2024Updated last year
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆19,786Feb 11, 2026Updated 2 weeks ago
- We Speech Transcript based on LLM, in 300 lines of code.☆184Jun 20, 2025Updated 8 months ago
- The Open Source Code of UniAudio☆605Jul 22, 2024Updated last year
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆3,530Nov 5, 2024Updated last year
- 一个简单的评测系统☆86Nov 2, 2025Updated 4 months ago
- [Findings of NAACL 2024] Source code of paper CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers a…☆69Mar 31, 2024Updated last year
- DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability☆107Jan 17, 2025Updated last year
- [ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations☆142Apr 27, 2024Updated last year
- SpeechGPT Series: Speech Large Language Models☆1,405Jul 22, 2024Updated last year
- LIGHTVOC AN UPSAMPLING-FREE GAN VOCODER BASED ON CONFORMER AND INVERSE SHORT-TIME FOURIER TRANSFORM☆18May 17, 2024Updated last year
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆9,750Feb 12, 2026Updated 2 weeks ago
- Just another FastSpeech 2 but cleaner code :)☆29Jun 28, 2024Updated last year
- ACM MM 2023 CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model☆213Apr 26, 2024Updated last year