Mobile-Artificial-Intelligence / babylon.cppLinks

Babylon.cpp is a C and C++ library for grapheme to phoneme conversion and text to speech synthesis. For phonemization a ONNX runtime port of the DeepPhonemizer model is used. For speech synthesis VITS models are used. Piper models are compatible after a conversion script is run.

☆29

Alternatives and similar repositories for babylon.cpp

Users that are interested in babylon.cpp are comparing it to the libraries listed below

Sorting:

taylorchu / 2cent-tts
☆55Updated 3 weeks ago
EndlessReform / smoltts
Open TTS models, built for streaming on the edge
☆45Updated 10 months ago
balisujohn / tortoise.cpp
A ggml (C++) re-implementation of tortoise-tts
☆193Updated last year
rhasspy / piper-phonemize
C++ library for converting text to phonemes for Piper
☆139Updated 6 months ago
zhaohb / MeloTTS-OV
Using OpenVINO to speed up MeloTTS inference
☆15Updated last year
fquirin / speech-recognition-experiments
Experiments to test different speech recognition systems for SEPIA Framework
☆63Updated 2 years ago
neuphonic / neucodec
A package for NeuCodec: a 50hz, 0.8kbps, 24kHz audio codec.
☆149Updated last week
Picovoice / orca
On-device streaming text-to-speech engine powered by deep learning
☆128Updated 2 weeks ago
neurlang / goruut
IPA Phonemizer/Dephonemizer for 140 human languages
☆53Updated 3 weeks ago
gooofy / zerovox
zero-shot realtime TTS system, fully offline, free and open source
☆50Updated 9 months ago
mesolitica / vllm-whisper
A high-throughput and memory-efficient inference and serving engine for Whisper, https://mesolitica.com/blog/vllm-whisper
☆31Updated last year
elyxlz / voxtral
Voxtral: Convert Mistral into a end2end SpeechLM. No information bottleneck, preserves prosody, learns interruptions from data. Unlike GP…
☆40Updated 11 months ago
duerig / StyleTTS2
StyleTTS 2 Optimized Training Fork
☆33Updated last year
frothywater / kanade-tokenizer
Kanade is a single-layer disentangled speech tokenizer that extracts compact tokens suitable for both generative and discriminative model…
☆47Updated last week
maxilevi / vits.cpp
a cpp ggml port of "VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech." for use in mobile…
☆43Updated last year
nmfisher / sherpa_onnx_dart
Dart plugin wrapping the Sherpa-ONNX runtime. Contains example for speech recognition with Flutter
☆22Updated last year
indri-voice / audiotoken
Audio tokenization, in the fastest way possible!
☆53Updated last year
Picovoice / koala
On-device noise suppression powered by deep learning
☆82Updated 2 weeks ago
clement-pages / gryannote
Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.
☆70Updated 3 months ago
anan235 / dia-multilingual
A TTS model capable of generating ultra-realistic dialogue in one pass.
☆219Updated 9 months ago
fakerybakery / OpenF5-TTS
(WIP) A retrain of F5-TTS on permissively-licensed data
☆13Updated 10 months ago
thomasgauthier / csm-hf
Implementation of Sesame's Conversational Speech Model for Hugging Face Transformers
☆57Updated 8 months ago
ex3ndr / supervoice-voicebox
VoiceBox neural network implementation
☆110Updated last year
thevoicecompany / gazelle-train
Joint speech-language model - respond directly to audio!
☆30Updated last year
sidharthrajaram / StyleTTS2
🐍 🤖 Pip installable package for StyleTTS 2 human-level text-to-speech and voice cloning
☆161Updated last year
rendchevi / daisy-tts
🌼 Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition
☆14Updated 2 months ago
dangtr0408 / StyleTTS2-lite
A lightweight, efficient variation of the StyleTTS 2 text‐to‐speech model.
☆52Updated 8 months ago
ekwek1 / soprano-factory
Soprano-Factory: Train your own 2000x realtime text-to-speech model
☆203Updated 3 weeks ago
Picovoice / cobra
On-device voice activity detection (VAD) powered by deep learning
☆242Updated 2 weeks ago
taresh18 / TTSizer
🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨
☆135Updated 5 months ago