Mobile-Artificial-Intelligence / babylon.cpp
Babylon.cpp is a C and C++ library for grapheme to phoneme conversion and text to speech synthesis. For phonemization a ONNX runtime port of the DeepPhonemizer model is used. For speech synthesis VITS models are used. Piper models are compatible after a conversion script is run.
☆10Updated 3 weeks ago
Related projects: ⓘ
- a cpp ggml port of "VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech." for use in mobile…☆29Updated 3 weeks ago
- [Early Alpha] A unified framework for text-to-speech, voice conversion, automatic speech recognition, audio classification, voice activit…☆20Updated 4 months ago
- Generate audio datasets for training Text-To-Speech models, through smart audio splitting with silence detection, and transcription using…☆27Updated last year
- Supervoice diffusion enhance☆24Updated 2 months ago
- python bindings for symphonia/opus - read various audio formats from python and write opus files☆16Updated last week
- VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration☆10Updated this week
- Unofficial implementation of wavenext vocoder☆28Updated 3 weeks ago
- Collection of scripts from mHuBERT-147.☆21Updated 2 months ago
- Production-ready vocoder using BigVSAN☆11Updated 7 months ago
- This is a TTS model based on VITS that can control the output speech emotion through natural language and control the speaker through ref…☆4Updated last month
- This will hold the crowdsourcing platform to be used to store voice data from various speakers which will act as input dataset for speech…☆17Updated last year
- Audio tokenization, in the fastest way possible!☆45Updated 3 weeks ago
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"☆22Updated this week
- ☆12Updated this week
- Aligner for text-to-speech☆15Updated 2 months ago
- Simple inference for Vits2 TTS Using ONNXRUNTIME and espeak-ng on C++☆11Updated 5 months ago
- LSLM implements full duplex modeling in interactive speech language models, based on research by Ma et al. (2024). This project advances …☆30Updated last week
- ☆26Updated this week
- My vocoder experiments☆20Updated last month
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one☆27Updated last month
- Zero-Shot Foreign Accent Conversion without a Native Reference☆27Updated 4 months ago
- Pytorch implementation of SoundCTM☆68Updated 3 weeks ago
- ☆12Updated last year
- Whisper_MCE☆13Updated 3 months ago
- Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmark☆18Updated 2 months ago
- Unofficial implementation of ConvNeXt-TTS powered by lightning and Rye☆12Updated 4 months ago
- ☆23Updated last year
- Supervoice Speaker Separation Network☆13Updated 3 months ago
- ☆10Updated last month
- Speech enhancement in noisy and reverberant environments using deep neural networks☆15Updated last month