Worlds first open-source real-time end-to-end spoken dialogue model with personalized voice cloning.
β541Apr 17, 2026Updated last month
Alternatives and similar repositories for FlashLabs-Chroma
Users that are interested in FlashLabs-Chroma are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π³ εε©εε©ζ«η θ·ε cookie η½ι‘΅ε·₯ε ·β278May 13, 2026Updated 2 weeks ago
- β40Nov 18, 2025Updated 6 months ago
- A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.β69Oct 28, 2024Updated last year
- FlowMirror-HydraVox β A natively accelerated multi-head autoregressive TTS system derived from CosyVoice 3.0. It predicts multiple tokensβ¦β49Feb 17, 2026Updated 3 months ago
- Fast audio super resolution from 16khz to 48khz.β209Jan 3, 2026Updated 4 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A codebase for data crawling and preprocessing for TTS and ASR systems training.β23Feb 26, 2026Updated 3 months ago
- The official implement of Freeze-Omni.β15Jul 10, 2025Updated 10 months ago
- [ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and sβ¦β154May 30, 2025Updated 11 months ago
- AlphaFace: High Fidelity and Real-time Face Swapper Robust to Facial Poseβ52Mar 28, 2026Updated 2 months ago
- β478May 19, 2025Updated last year
- Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'β161Mar 26, 2026Updated 2 months ago
- This repo is text to speech with learnable audio encoder without alignment with transcript referenceβ53Sep 20, 2025Updated 8 months ago
- Tidy Tunes is an easy-to-use pipeline for mining high-quality audio data for speech generation models. To do so, it chains multiple open β¦β23May 19, 2026Updated last week
- WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Modelsβ31Feb 13, 2026Updated 3 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer β’ AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Official Repository of Paper: "SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding" (ICβ¦β71Apr 27, 2026Updated last month
- This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples aβ¦β658Jun 9, 2024Updated last year
- β11Mar 1, 2024Updated 2 years ago
- β184Aug 25, 2025Updated 9 months ago
- β32Sep 14, 2022Updated 3 years ago
- One-shot TTS with Improved Unseen Speaker and Style Transferβ37Mar 2, 2022Updated 4 years ago
- Get aid from local LLMs right in your PowerShellβ16May 2, 2025Updated last year
- SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesisβ151Jan 1, 2025Updated last year
- VocalVerse: A powerful vocal evaluation framework powered by the Qwen LLMsβ46May 11, 2026Updated 2 weeks ago
- GPUs on demand by Runpod - Special Offer Available β’ AdRun AI, ML, and HPC workloads on powerful cloud GPUsβwithout limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Repository for the paper "Combining audio control and style transfer using latent diffusion", accepted at ISMIR 2024β66Feb 19, 2025Updated last year
- VectorTalker: SVG Talking Face Generation with Progressive Vectorisationβ14Dec 25, 2023Updated 2 years ago
- Transformer based ASR Engine.β13Aug 23, 2021Updated 4 years ago
- Python data pipeline to acquire, clean, and calculate vegetation indices from Sentinel-2 satellite image. Package is available only for oβ¦β14Feb 9, 2022Updated 4 years ago
- Soprano: Instant, Ultra-Realistic Text-to-Speechβ1,234Jan 15, 2026Updated 4 months ago
- TeleMem is a high-performance drop-in replacement for Mem0, featuring semantic deduplication, long-term dialogue memory, and multimodal vβ¦β461May 8, 2026Updated 3 weeks ago
- β302Jul 22, 2025Updated 10 months ago
- An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.β206Apr 30, 2026Updated 3 weeks ago
- β493May 6, 2025Updated last year
- Open source password manager - Proton Pass β’ AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- X-Talk is an open-source full-duplex cascaded spoken dialogue system framework enabling low-latency, interruptible, and human-like speechβ¦β206May 20, 2026Updated last week
- Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".β26Jul 2, 2024Updated last year
- [ICASSP 2024] π΅ Matcha-TTS: A fast TTS architecture with conditional flow matchingβ1,303May 18, 2026Updated last week
- Pytorch Implementation of the paper "M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis"β123Dec 18, 2025Updated 5 months ago
- Exploring Binary Classification Loss for Speaker Verificationβ18Jul 18, 2023Updated 2 years ago
- Easy-to-Use Speech MOS predictorsβ354Oct 24, 2023Updated 2 years ago
- [ACL 2026 Main] MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flowsβ140Sep 2, 2025Updated 8 months ago