gooofy / zerovoxLinks
zero-shot realtime TTS system, fully offline, free and open source
β47Updated 6 months ago
Alternatives and similar repositories for zerovox
Users that are interested in zerovox are comparing it to the libraries listed below
Sorting:
- ποΈ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets β¨β128Updated 2 months ago
- Simple and lightweight Zero-shot Text-to-Speech (TTS) synthesis modelβ35Updated 6 months ago
- StyleTTS 2 Optimized Training Forkβ34Updated 8 months ago
- High quality text-to-speech based on StyleTTS 2.β67Updated last week
- (WIP) A retrain of F5-TTS on permissively-licensed dataβ13Updated 6 months ago
- Export an ONNX graph that performs ISTFT. Designed for TTS models.β27Updated last year
- A package for NeuCodec: a 50hz, 0.8kbps, 24kHz audio codec.β104Updated 3 weeks ago
- An unofficial PyTorch implementation of the StreamVC(Real-Time Low-Latency Voice Conversion)β128Updated last year
- SoTA open-source TTSβ102Updated 4 months ago
- An unofficial pytorch implementation of "STREAMVC: REAL-TIME LOW-LATENCY VOICE CONVERSION".β78Updated 6 months ago
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GPβ¦β102Updated last year
- PitchVC: Pitch Conditioned Any-to-Many Voice Conversionβ35Updated last year
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.β68Updated last week
- An unofficial PyTorch implementation of VALL-Eβ88Updated 2 months ago
- β50Updated last week
- β28Updated last year
- A TTS model capable of generating ultra-realistic dialogue in one pass.β125Updated 3 months ago
- This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).β88Updated 3 weeks ago
- a Frontier Japanese Speech Generation netβ56Updated 5 months ago
- Your one-stop solution for voice dataset creationβ127Updated last year
- Automatically cleaning, enhancing, segmenting, filtering, and formatting a dataset to fine tune or train a voice model.β43Updated last month
- StyleTTS2 + Vocos as a Decoderβ13Updated 7 months ago
- β50Updated 7 months ago
- [EMNLP 2025 Findings] Official code for EZ-VC: Easy Zero-shot Any-to-Any Voice Conversionβ27Updated last month
- β44Updated 3 months ago
- π Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. π§π₯π Advanced audio processing.β254Updated last year
- β28Updated last year
- A Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTSβ50Updated 10 months ago
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDβ¦β185Updated last month
- A collection of all our phonemeizers for dataset construction and inferenceβ27Updated 8 months ago