ictnlp / StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
☆1,027Updated 5 months ago
Alternatives and similar repositories for StreamSpeech:
Users that are interested in StreamSpeech are comparing it to the libraries listed below
- Interface for OuteTTS models.☆926Updated last week
- ☆1,154Updated 8 months ago
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".☆879Updated 3 months ago
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,513Updated 6 months ago
- A Fast TTS Engine☆451Updated 3 weeks ago
- Open source inference code for Rev's model☆377Updated last month
- Local SRT/LLM/TTS Voicechat☆620Updated 4 months ago
- InspireMusic: A Unified Framework for Music, Song, Audio Generation.☆838Updated this week
- ☆1,113Updated last week
- Fuse ChatTTS with OpenVoice, upload a 10-second audio clip, and clone your personalized ChatTTS voice.☆408Updated 3 months ago
- An Open-Sourced LLM-empowered Foundation TTS System☆601Updated 4 months ago
- Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection☆584Updated 2 months ago
- ☆327Updated 6 months ago
- Whisper with Medusa heads☆822Updated last week
- first base model for full-duplex conversational audio☆1,707Updated last month
- TTS with kokoro and onnx runtime☆1,614Updated last week
- High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!☆336Updated last month
- Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯☆799Updated last month
- 🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.☆1,046Updated last week
- A toolkit for speaker diarization.☆172Updated 3 months ago
- The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,597Updated 7 months ago
- Controllable and fast Text-to-Speech for over 7000 languages!☆1,545Updated 3 months ago
- zero-shot voice conversion & singing voice conversion, with real-time support☆1,080Updated this week
- VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design☆536Updated last year
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆2,251Updated last week
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆3,157Updated 3 months ago
- AI powered speech denoising and enhancement☆1,641Updated 2 months ago