wenet-e2e / west
We Speech Transcript based on LLM, in 300 lines of code.
☆126Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for west
- A toolkit for speaker diarization.☆137Updated 2 weeks ago
- A lightweight end-to-end text-to-speech model☆90Updated last month
- SenseVoice-python: A enterprise-grade open source multi-language asr system from funasr opensource with onnxruntime☆72Updated last month
- Pseudo Streaming SenseVoice with Hotwords☆73Updated last week
- ☆165Updated last month
- flow mirror models from JZX AI Labs☆40Updated last month
- LlamaVoice is a llama-based large voice generation model, providing inference and training ability.☆219Updated 2 months ago
- Grapheme-to-Phoneme for Mixed Chinese (Mandarin or Cantonese) and English☆71Updated this week
- A enterprise-grade Voice Activity Detector from modelscope and funasr.☆59Updated last year
- Open source inference code for Rev's model☆329Updated last week
- 端到端语音唤醒工具箱,从模型训练到模型推理。☆73Updated 2 months ago
- Port of Funasr's Sense-voice model in C/C++☆157Updated 2 weeks ago
- Huawei Grad-TTS for Chinese☆45Updated last year
- RealSI: Open Benchmark for Simultaneous Interpretation in Real-world Scenarios☆44Updated 3 months ago
- Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice☆128Updated 3 weeks ago
- SpeechDenoiser: Real-Time Speech Denoising with ONNX Welcome to SpeechDenoiser, a simple and effective solution for real-time speech den…☆43Updated 2 months ago
- ☆36Updated 3 months ago
- 百聆 是一个类似GPT-4o的语音对话机器人,通过ASR+LLM+TTS实现,时延低至800ms,低配置也可运行,支持打断☆35Updated last month
- Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3☆363Updated last month
- noise reduction☆17Updated 4 months ago
- An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement☆114Updated last week
- 基于FunASR实现语音识别,包含常规版和ONNX版(推荐)。☆24Updated 3 weeks ago
- The YouTube Text-To-Speech dataset is comprised of waveform audio extracted from YouTube videos alongside their English transcriptions☆50Updated 3 years ago
- Bert-VITS2项目bug多且教程不友好。本proj尽可能修复了Bert-vits2项目的bug,并且可一键启动训练。仅需50条目标说话人语音,获得稳定、快速的TTS模型。☆28Updated last month
- Grapheme-to-Phoneme lexicons for Chinese dialects☆66Updated last year
- ☆65Updated last year
- This repo is an exploratory experiment to enable frozen pretrained RWKV language models to accept speech modality input. We followed the …☆31Updated last week
- VALL-E 2 reproduction☆83Updated 3 months ago
- Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection☆249Updated 2 months ago