Mobile-Artificial-Intelligence / babylon.cppLinks
Babylon.cpp is a C and C++ library for grapheme to phoneme conversion and text to speech synthesis. For phonemization a ONNX runtime port of the DeepPhonemizer model is used. For speech synthesis VITS models are used. Piper models are compatible after a conversion script is run.
☆29Updated 5 months ago
Alternatives and similar repositories for babylon.cpp
Users that are interested in babylon.cpp are comparing it to the libraries listed below
Sorting:
- ☆55Updated 3 weeks ago
- Open TTS models, built for streaming on the edge☆45Updated 10 months ago
- A ggml (C++) re-implementation of tortoise-tts☆193Updated last year
- C++ library for converting text to phonemes for Piper☆139Updated 6 months ago
- Using OpenVINO to speed up MeloTTS inference☆15Updated last year
- Experiments to test different speech recognition systems for SEPIA Framework☆63Updated 2 years ago
- A package for NeuCodec: a 50hz, 0.8kbps, 24kHz audio codec.☆149Updated last week
- On-device streaming text-to-speech engine powered by deep learning☆128Updated 2 weeks ago
- IPA Phonemizer/Dephonemizer for 140 human languages☆53Updated 3 weeks ago
- zero-shot realtime TTS system, fully offline, free and open source☆50Updated 9 months ago
- A high-throughput and memory-efficient inference and serving engine for Whisper, https://mesolitica.com/blog/vllm-whisper☆31Updated last year
- Voxtral: Convert Mistral into a end2end SpeechLM. No information bottleneck, preserves prosody, learns interruptions from data. Unlike GP…☆40Updated 11 months ago
- StyleTTS 2 Optimized Training Fork☆33Updated last year
- Kanade is a single-layer disentangled speech tokenizer that extracts compact tokens suitable for both generative and discriminative model…☆47Updated last week
- a cpp ggml port of "VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech." for use in mobile…☆43Updated last year
- Dart plugin wrapping the Sherpa-ONNX runtime. Contains example for speech recognition with Flutter☆22Updated last year
- Audio tokenization, in the fastest way possible!☆53Updated last year
- On-device noise suppression powered by deep learning☆82Updated 2 weeks ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆70Updated 3 months ago
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆219Updated 9 months ago
- (WIP) A retrain of F5-TTS on permissively-licensed data☆13Updated 10 months ago
- Implementation of Sesame's Conversational Speech Model for Hugging Face Transformers☆57Updated 8 months ago
- VoiceBox neural network implementation☆110Updated last year
- Joint speech-language model - respond directly to audio!☆30Updated last year
- 🐍 🤖 Pip installable package for StyleTTS 2 human-level text-to-speech and voice cloning☆161Updated last year
- 🌼 Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition☆14Updated 2 months ago
- A lightweight, efficient variation of the StyleTTS 2 text‐to‐speech model.☆52Updated 8 months ago
- Soprano-Factory: Train your own 2000x realtime text-to-speech model☆203Updated 3 weeks ago
- On-device voice activity detection (VAD) powered by deep learning☆242Updated 2 weeks ago
- 🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨☆135Updated 5 months ago