☆26Nov 3, 2025Updated 4 months ago
Alternatives and similar repositories for KaniTTS-Finetune-pipeline
Users that are interested in KaniTTS-Finetune-pipeline are comparing it to the libraries listed below
Sorting:
- Arabic Grapheme-to-Phoneme (G2P) Conversion☆13Mar 15, 2025Updated 11 months ago
- DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors☆37Feb 11, 2025Updated last year
- SpeechPlus: Small LLM-Based Text-to-Speech Library 🚀☆20May 20, 2025Updated 9 months ago
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆16Jun 16, 2024Updated last year
- ESLTTS dataset☆16Feb 6, 2025Updated last year
- This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.☆20Nov 19, 2024Updated last year
- Whisper Speaker Identification (WSI), a cutting-edge model for multilingual speaker identification.☆26Mar 17, 2025Updated 11 months ago
- poorman's ar-dit tts☆45Dec 31, 2025Updated 2 months ago
- An AR+AR TTS attempt.☆18Jan 13, 2025Updated last year
- Generate audio datasets for training Text-To-Speech models, through smart audio splitting with silence detection, and transcription using…☆30May 27, 2023Updated 2 years ago
- An open source NLP as a service project focused on providing state of the art systems with ease. Training and inference by simple docker …☆20Sep 17, 2024Updated last year
- A simple, but performant framework for mapping speech directly to categories and intents.☆25Aug 8, 2024Updated last year
- Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS (E2 TTS) in MLX☆29Oct 15, 2024Updated last year
- ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models☆38Nov 18, 2025Updated 3 months ago
- ☆26Mar 20, 2024Updated last year
- ☆68Dec 30, 2025Updated 2 months ago
- ☆25Mar 6, 2024Updated 2 years ago
- Accelerate Whisper tasks such as transcription, by multiprocesing through parallelization☆25Oct 29, 2022Updated 3 years ago
- The official implementation of the DIFFA series for dLLM-based large audio language model☆66Mar 3, 2026Updated last week
- Fine-tuning toolkit for Chatterbox TTS & Chatterbox TURBO models. Supports 23 languages with smart vocabulary extension. Features offline…☆79Feb 20, 2026Updated 2 weeks ago
- ☆454Nov 2, 2025Updated 4 months ago
- An open-source Kazakh Emotional Text-to-Speech Dataset☆35Aug 1, 2025Updated 7 months ago
- This repository implement a novel zero-shot TTS framework, named Flamed-TTS, focusing on the efficient generation and dynamic pacing in …☆57Aug 9, 2025Updated 7 months ago
- My vocoder experiments☆31Jul 26, 2025Updated 7 months ago
- This repository contains code for applying Data2Vec to pretrain Keyword Transformer model as described in "Improving Label-Deficient Keyw…☆30Mar 6, 2025Updated last year
- [INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"☆64Jun 16, 2025Updated 8 months ago
- Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners" [ICASSP 2025] and "Mitigat…☆56Jan 18, 2026Updated last month
- ☆13Oct 27, 2025Updated 4 months ago
- Trainging, inference, and testing of the SAC speech codec model.☆100Nov 1, 2025Updated 4 months ago
- text-to-audio-latent-diffusion☆37Aug 25, 2023Updated 2 years ago
- Audiobook creation tool with support for multiple TTS models (Qwen3-TTS, MiraTTS, GLM-TTS, IndexTTS2, VibeVoice, Higgs V2, Fish S1-mini, …☆77Feb 27, 2026Updated last week
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…☆39Jan 6, 2024Updated 2 years ago
- A tool to collect/validate audio recordings from workers on Amazon Mechanical Turk. Written in Python/Flask. (originally hosted on github…☆14Dec 19, 2022Updated 3 years ago
- Code for the paper "RIR-in-a-Box : Estimating Room Acoustics from 3D Mesh Data through Shoebox Approximation" presented at Interspeech 20…☆16Sep 1, 2024Updated last year
- A python script COMMAND LINE utility to AUTO GENERATE SUBTITLE FILE (using free Vosk Speech Recognition API) and TRANSLATED SUBTITLE FILE…☆11May 5, 2024Updated last year
- ☆11Aug 11, 2023Updated 2 years ago
- Whisper finetuning☆16Apr 9, 2025Updated 11 months ago
- Learning an Interpretable End-to-End Network for Real-Time Acoustic Beamforming☆15Aug 20, 2024Updated last year
- Neural Homomorphic Vocoder optimized for singing voice synthesis☆18Mar 2, 2026Updated last week