☆34Jun 9, 2025Updated 8 months ago
Alternatives and similar repositories for SparkVox
Users that are interested in SparkVox are comparing it to the libraries listed below
Sorting:
- Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder☆31Aug 30, 2025Updated 6 months ago
- Event Relation in Text-to-Audio (TTA) Generation☆20Feb 26, 2025Updated last year
- 5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs☆57Nov 19, 2025Updated 3 months ago
- An AR+AR TTS attempt.☆18Jan 13, 2025Updated last year
- A pitch detection model trained to be robust against noise and reverberation environments.☆27Jan 21, 2025Updated last year
- The official repository for the paper “NonVerbalSpeech-38K: A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understandi…☆63Dec 26, 2025Updated 2 months ago
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated 10 months ago
- Repository for "Training Audio Captioning Models without Audio"☆10Sep 26, 2023Updated 2 years ago
- Arabic Grapheme-to-Phoneme (G2P) Conversion☆13Mar 15, 2025Updated 11 months ago
- A Benchmark and Evaluation Suite for Zero-shot Singing Voice Synthesis☆23Feb 11, 2026Updated 2 weeks ago
- ☆10Sep 25, 2024Updated last year
- T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …☆28Nov 7, 2025Updated 3 months ago
- ☆68Dec 30, 2025Updated 2 months ago
- Thai Grapheme to Phoneme (G2P) Wiktionary Corpus☆13Jul 25, 2022Updated 3 years ago
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.☆105May 5, 2025Updated 9 months ago
- This is the repository for the work "BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective".☆64Nov 5, 2025Updated 3 months ago
- ☆36Sep 6, 2025Updated 5 months ago
- Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023☆12May 13, 2024Updated last year
- ☆18Aug 23, 2024Updated last year
- Tidy Tunes is an easy-to-use pipeline for mining high-quality audio data for speech generation models. To do so, it chains multiple open …☆22Feb 7, 2026Updated 3 weeks ago
- (WIP) A retrain of F5-TTS on permissively-licensed data☆13Apr 6, 2025Updated 10 months ago
- ☆99Jan 19, 2026Updated last month
- Official Repository of Paper: "Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios"(AAAI 2026)☆89Jan 31, 2026Updated last month
- ☆13Sep 12, 2024Updated last year
- Glow-TTS with Stochastic Duration Predictor and Stochastic Pitch Predictor☆18Jun 5, 2023Updated 2 years ago
- This repository contains the training code from paper "SpidR Learning Fast and Stable Linguistic Units for Spoken Language Models Without…☆50Feb 4, 2026Updated 3 weeks ago
- Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.☆62Sep 5, 2025Updated 5 months ago
- Trainging, inference, and testing of the SAC speech codec model.☆99Nov 1, 2025Updated 4 months ago
- poorman's ar-dit tts☆45Dec 31, 2025Updated 2 months ago
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆43Mar 3, 2025Updated 11 months ago
- DiFlow-TTS delivers low-latency zero-shot TTS via discrete flow matching and factorized speech tokens. A compact, open framework for fast…☆53Updated this week
- FREECODEC: A DISENTANGLED NEURAL SPEECH CODEC WITH FEWER TOKENS☆24Sep 9, 2024Updated last year
- LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …☆92Dec 28, 2024Updated last year
- Speech Resynthesis and Language Modeling☆27Jun 11, 2025Updated 8 months ago
- Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'☆155Mar 24, 2025Updated 11 months ago
- [ICASSP 2025] "FLowHigh: Towards efficient and high-quality audio super-resolution with single-step flow matching"☆108Jan 17, 2025Updated last year
- [ICASSP 2026] Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis☆36Dec 24, 2025Updated 2 months ago
- Audio-JEPA is an adaptation of the Joint-Embedding Predictive Architecture (JEPA) for self-supervised audio representation learning. Buil…☆40Jun 17, 2025Updated 8 months ago
- Source code for the EMNLP 2025 paper “DM-Codec: Distilling Multimodal Representations for Speech Tokenization”☆56Jun 1, 2025Updated 9 months ago