esnya / realtime-whisper
ASR (Automatic Speech Recognition) for real-time streamed audio powered by Whisper and tranformers
☆17Updated last month
Related projects: ⓘ
- Grapheme-to-Phoneme lexicons for Chinese dialects☆67Updated last year
- SenseVoice-python: A enterprise-grade open source multi-language asr system from funasr opensource with onnxruntime☆55Updated 3 weeks ago
- We Speech Transcript based on LLM, in 300 lines of code.☆117Updated last month
- ONNX Inference of Pyannote Segmentation☆54Updated last week
- A enterprise-grade Voice Activity Detector from modelscope and funasr.☆49Updated last year
- LlamaVoice is a llama-based large voice generation model, providing inference and training ability.☆169Updated 3 weeks ago
- Real-time Voice Activity Detection (VAD) with some example use case like simple voice bot and live transcription (realtime transcription)☆43Updated 3 months ago
- Awesome TTS☆48Updated 3 years ago
- 中文标点符号模型,可以给文本添加标点符号。☆128Updated 6 months ago
- flow mirror models from JZX AI Labs☆33Updated last week
- 端到端语音唤醒 工具箱,从模型训练到模型推理。☆64Updated 2 weeks ago
- Putting flows on top of neural transducers for better TTS☆63Updated last month
- Sample Repository for the AlibabaCloud Bailian Speech SDK☆26Updated 2 weeks ago
- Fine-Tune Whisper with Transformers and PEFT☆26Updated 10 months ago
- Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3☆335Updated last week
- SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems☆74Updated 8 months ago
- An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io☆66Updated last year
- zero-shot voice conversion with in context learning☆163Updated this week
- Port of Funasr's Sense-voice model in C/C++☆88Updated last week
- lina-speech : linear attention based text-to-speech☆111Updated 3 months ago
- Speech recognition & diarisation solution with text alignment, deployed in AML pipelines☆81Updated 4 months ago
- A fast speech-to-any translation model that supports simultaneous decoding and offers 28× speedup.☆60Updated last month
- A enterprise-grade Chinese-English code switch punctuator from funasr.☆14Updated 4 months ago
- ONNX implementation of Whisper. PyTorch free.☆79Updated last month
- Efficient approach to speaker diarization using voice characteristics extraction☆56Updated 4 months ago
- Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection☆171Updated 2 weeks ago
- Speech Diarization for scrum automation☆94Updated last year
- SyntaSpeech: Syntax-aware Generative Adversarial Text-to-Speech; IJCAI 2022; Official code☆193Updated 2 years ago
- FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music gener…☆344Updated 7 months ago
- 基于 g2pW 提升 pypinyin 的准确性☆73Updated last year