mush42 / optispeech
A lightweight end-to-end text-to-speech model
☆112Updated last month
Alternatives and similar repositories for optispeech:
Users that are interested in optispeech are comparing it to the libraries listed below
- We Speech Transcript based on LLM, in 300 lines of code.☆157Updated last month
- A toolkit for speaker diarization.☆180Updated 2 weeks ago
- SenseVoice-python: A enterprise-grade open source multi-language asr system from funasr opensource with onnxruntime☆88Updated 6 months ago
- ☆159Updated 4 months ago
- F5-TTS 推理加速,速度提升约4倍!☆71Updated 3 months ago
- VoiceStar: Robust, Duration-controllable TTS that can Extrapolate☆103Updated this week
- Speech Diarization for scrum automation☆102Updated last year
- Running the F5-TTS by ONNX Runtime☆142Updated last week
- Open source inference code for Rev's model☆395Updated last month
- LlamaVoice is a llama-based large voice generation model, providing inference and training ability.☆231Updated 7 months ago
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆247Updated last month
- Nendo is an open source platform for AI-driven audio management, intelligence, and generation.☆120Updated last year
- Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3☆400Updated 7 months ago
- TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loud…☆93Updated 3 months ago
- OSUM: Open Speech Understanding Model, open-sourced by ASLP@NPU.☆354Updated 3 weeks ago
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆95Updated 6 months ago
- flow mirror models from JZX AI Labs☆44Updated 6 months ago
- Cantonese Text to Speech with VITS implementation☆29Updated 2 years ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆161Updated 10 months ago
- A enterprise-grade Voice Activity Detector from modelscope and funasr.☆91Updated last year
- ☆29Updated last month
- An unofficial PyTorch implementation of VALL-E☆87Updated this week
- ☆193Updated 6 months ago
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆412Updated this week
- Grapheme-to-Phoneme for Mixed Chinese (Mandarin or Cantonese) and English.☆95Updated 3 weeks ago
- Real time faster whisper gradio☆26Updated 6 months ago
- Collection of Open Source Speech Data☆153Updated 5 months ago
- Dynamic Voice Actor Assignment and Emotional Narration for Realistic Story Play☆40Updated last week
- Application of MB-iSTFT-VITS components to vits2_pytorch☆126Updated 4 months ago
- The YouTube Text-To-Speech dataset is comprised of waveform audio extracted from YouTube videos alongside their English transcriptions☆51Updated 4 years ago