youmebangbang / TTS-dataset-toolsView external linksLinks
Automatically generates TTS dataset using audio and associated text. Make cuts under a custom length. Uses Google Speech to text API to perform diarization and transcription or aeneas to force align text to audio.
☆52Apr 17, 2022Updated 3 years ago
Alternatives and similar repositories for TTS-dataset-tools
Users that are interested in TTS-dataset-tools are comparing it to the libraries listed below
Sorting:
- Split long audio files based on subtitle-info in SRT File (Transcript saved in CSV)☆20Nov 14, 2019Updated 6 years ago
- KABooks is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. Using a…☆12Mar 24, 2023Updated 2 years ago
- A curated list of other awesome open-source governments organisations and projects☆13Apr 28, 2022Updated 3 years ago
- ☆14Aug 19, 2024Updated last year
- ☆18Jun 23, 2021Updated 4 years ago
- Tools to create your own voice dataset for TTS training☆70Oct 26, 2020Updated 5 years ago
- an implementation of 3D Ken Burns Effect from a Single Image using PyTorch☆37Aug 3, 2020Updated 5 years ago
- ☆18Aug 17, 2022Updated 3 years ago
- An implementation of the paper titled "Arabic Speech Emotion Recognition Employing Wav2vec2.0 and HuBERT Based on BAVED Dataset" https://…☆15Feb 17, 2022Updated 3 years ago
- ☆63Feb 5, 2021Updated 5 years ago
- speaker-disentangled speech linguistic content quantizer☆24Mar 19, 2025Updated 10 months ago
- Performant and accurate speech recognition built on Pytorch☆254May 19, 2022Updated 3 years ago
- ☆19Jul 11, 2024Updated last year
- Multivoice: Enhance your foreign-language movie and TV show experience with personalized dubbed versions. Our project uses voice cloning …☆27Aug 1, 2023Updated 2 years ago
- An Alexa skill providing a conversational interface to any public figure (as mimicked by GPT3). The legacy GUI is no longer maintained.☆20Nov 6, 2023Updated 2 years ago
- ☆20Mar 16, 2023Updated 2 years ago
- TAPE: An End-to-End Timbre-Aware Pitch Estimator☆23Nov 25, 2023Updated 2 years ago
- This repository will contain code for the paper "CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot tr…☆26Dec 23, 2023Updated 2 years ago
- Train neural networks to generate watercolour paintings from pencil sketches.☆20Oct 30, 2018Updated 7 years ago
- Non Parallel Voice Conversion based on VITS☆24Mar 31, 2023Updated 2 years ago
- [IJCAI'23] Learning to Speak from Text for Low-Resource TTS☆64May 30, 2023Updated 2 years ago
- Vecna is a Python chatbot which recommends songs and movies depending upon your feelings☆11Jun 28, 2022Updated 3 years ago
- ☆26Aug 8, 2024Updated last year
- Translated vocal synthesis - Clone a voice and output speech in another language☆26May 3, 2022Updated 3 years ago
- Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis☆27Mar 21, 2025Updated 10 months ago
- Talking head animation☆28Dec 8, 2023Updated 2 years ago
- This is a collection of resources on AI-AR-ART generation.☆28Dec 14, 2022Updated 3 years ago
- SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model☆107Sep 10, 2021Updated 4 years ago
- StyleTTS 2 Optimized Training Fork☆33Feb 2, 2025Updated last year
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆78Nov 1, 2024Updated last year
- Finetune the 1.4B latent diffusion text2img-large checkpoint from CompVis using deepspeed. (work-in-progress)☆36Apr 17, 2022Updated 3 years ago
- Multi-step AI agents powered by Gemini 2.0 and the LangGraph framework. These agents orchestrate complex workflows and enhance their reas…☆10Dec 19, 2024Updated last year
- HiFi-SR is a Python-based pipeline for the detection of plant mitochondrial structural rearrangements based on the mapping of PacBio high…☆10Apr 15, 2025Updated 10 months ago
- Agile metrics tools allows you to track metrics from different sources in order to identify trends and patterns on how your team performa…☆11Jan 2, 2026Updated last month
- Torch implementation of NANSY, Neural Analysis and Synthesis, arXiv:2110.14513☆64Feb 13, 2023Updated 3 years ago
- Physics-based Zero-Shot Video Generation☆31Oct 4, 2024Updated last year
- A lightweight, efficient variation of the StyleTTS 2 text‐to‐speech model.☆52May 22, 2025Updated 8 months ago
- PyTorch implementation of NEUTART, a system that creates photorealistic talking avatars from an input text transcription.☆34Mar 11, 2025Updated 11 months ago
- Overcooked! 2 TAS Development Framework☆10Aug 18, 2023Updated 2 years ago