☆35Jun 9, 2025Updated 9 months ago
Alternatives and similar repositories for SparkVox
Users that are interested in SparkVox are comparing it to the libraries listed below
Sorting:
- Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder☆35Aug 30, 2025Updated 6 months ago
- ☆10Sep 25, 2024Updated last year
- An AR+AR TTS attempt.☆18Jan 13, 2025Updated last year
- Event Relation in Text-to-Audio (TTA) Generation☆20Feb 26, 2025Updated last year
- 5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs☆57Nov 19, 2025Updated 4 months ago
- A Benchmark and Evaluation Suite for Zero-shot Singing Voice Synthesis☆24Feb 11, 2026Updated last month
- The official repository for the paper “NonVerbalSpeech-38K: A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understandi…☆63Dec 26, 2025Updated 2 months ago
- A pitch detection model trained to be robust against noise and reverberation environments.☆27Jan 21, 2025Updated last year
- Arabic Grapheme-to-Phoneme (G2P) Conversion☆13Mar 15, 2025Updated last year
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated 11 months ago
- Thai Grapheme to Phoneme (G2P) Wiktionary Corpus☆13Jul 25, 2022Updated 3 years ago
- T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …☆28Nov 7, 2025Updated 4 months ago
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.☆108May 5, 2025Updated 10 months ago
- ☆68Dec 30, 2025Updated 2 months ago
- This is the official implementation for εar-VAE model including inference and evaluation parts, more details coming soon...☆68Feb 13, 2026Updated last month
- Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.☆61Sep 5, 2025Updated 6 months ago
- ☆100Jan 19, 2026Updated 2 months ago
- ☆36Sep 6, 2025Updated 6 months ago
- This is the repository for the work "BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective".☆64Nov 5, 2025Updated 4 months ago
- Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation☆138Mar 8, 2026Updated last week
- Tidy Tunes is an easy-to-use pipeline for mining high-quality audio data for speech generation models. To do so, it chains multiple open …☆23Updated this week
- Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"☆109Dec 20, 2025Updated 3 months ago
- Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'☆156Mar 24, 2025Updated 11 months ago
- ☆13Sep 12, 2024Updated last year
- LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …☆93Dec 28, 2024Updated last year
- ☆18Aug 23, 2024Updated last year
- List of Podcast Feeds using iTunes API and script to download 6,000,000~ hours of English speech.☆31Apr 13, 2023Updated 2 years ago
- [ASRU 2025] Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?☆43Nov 21, 2025Updated 4 months ago
- Trainging, inference, and testing of the SAC speech codec model.☆100Nov 1, 2025Updated 4 months ago
- Official Repository of Paper: "Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios"(AAAI 2026)☆91Jan 31, 2026Updated last month
- poorman's ar-dit tts☆45Dec 31, 2025Updated 2 months ago
- [ICASSP 2026] Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis☆38Dec 24, 2025Updated 2 months ago
- (WIP) A retrain of F5-TTS on permissively-licensed data☆13Apr 6, 2025Updated 11 months ago
- [ICASSP 2025] "FLowHigh: Towards efficient and high-quality audio super-resolution with single-step flow matching"☆108Jan 17, 2025Updated last year
- SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)☆108Aug 1, 2025Updated 7 months ago
- Glow-TTS with Stochastic Duration Predictor and Stochastic Pitch Predictor☆18Jun 5, 2023Updated 2 years ago
- EMO-SUPERB submission☆51Oct 13, 2025Updated 5 months ago
- Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"☆213Sep 19, 2024Updated last year
- The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…☆45Sep 5, 2025Updated 6 months ago