neonbjb / DL-Art-SchoolView external linksLinks
DLAS - A configuration-driven trainer for generative models
☆142Oct 11, 2022Updated 3 years ago
Alternatives and similar repositories for DL-Art-School
Users that are interested in DL-Art-School are comparing it to the libraries listed below
Sorting:
- Performant and accurate speech recognition built on Pytorch☆254May 19, 2022Updated 3 years ago
- TorToiSe fine-tuning with DLAS☆226Aug 1, 2024Updated last year
- Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models☆175Dec 18, 2023Updated 2 years ago
- Train the next generation of TTS systems.☆171Sep 13, 2024Updated last year
- Fast TorToiSe inference (5x or your money back!)☆830Jul 10, 2024Updated last year
- Community framework for training tortoise☆44Oct 29, 2022Updated 3 years ago
- FlowMirror-HydraVox — A natively accelerated multi-head autoregressive TTS system derived from CosyVoice 3.0. It predicts multiple tokens…☆38Feb 10, 2026Updated last week
- PyTorch Implementation of Google's Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling☆191Nov 18, 2021Updated 4 years ago
- Pytorch implementation of BigVSAN☆203Dec 9, 2025Updated 2 months ago
- Create training data for training a voice cloner for bark text to speech.☆48Jun 13, 2023Updated 2 years ago
- A fast MP3 decoder for python, using minimp3☆30Sep 20, 2022Updated 3 years ago
- ☆259May 15, 2023Updated 2 years ago
- Make-A-Video Latent Diffusion Model☆19Nov 15, 2023Updated 2 years ago
- ☆389Sep 3, 2024Updated last year
- PPG-Based Voice Conversion☆348Jul 22, 2022Updated 3 years ago
- A multi-voice TTS system trained with an emphasis on quality☆14,809Nov 19, 2024Updated last year
- Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis☆1,064Aug 7, 2024Updated last year
- The Open Source Code of UniAudio☆603Jul 22, 2024Updated last year
- State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.☆1,714Jan 26, 2026Updated 3 weeks ago
- LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning☆159Jun 13, 2024Updated last year
- GPT-style network for phonemization with durations of text☆68Mar 21, 2024Updated last year
- General Speech Restoration☆1,284Feb 17, 2025Updated last year
- Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.☆132Sep 25, 2023Updated 2 years ago
- Include Basis-MelGAN, MelGAN, HifiGAN and Multiband-HifiGAN, maybe NHV in the future.☆157Jul 2, 2021Updated 4 years ago
- Rich Prosody Diversity Modelling with Phone-level Mixture Density Network☆45Dec 1, 2021Updated 4 years ago
- Unofficial PyTorch Implementation of UnivNet Vocoder (https://arxiv.org/abs/2106.07889)☆281Oct 8, 2021Updated 4 years ago
- Official PyTorch implementation of BigVGAN (ICLR 2023)☆1,184Sep 5, 2024Updated last year
- AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model☆289Oct 12, 2025Updated 4 months ago
- Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch☆1,334Sep 24, 2023Updated 2 years ago
- Tools to create your own voice dataset for TTS training☆70Oct 26, 2020Updated 5 years ago
- A simple script to prepare dataset for training with TTS Tortoise model via https://git.ecker.tech/mrq/ai-voice-cloning☆12Jan 12, 2024Updated 2 years ago
- 🕵️♂️🔊 Automatically update Audio Deepfake Detection (ADD) papers daily using GitHub Actions (updates every 12 hours)☆17Updated this week
- [WIP] VoiceSmith makes training text to speech models easy.☆228Oct 10, 2022Updated 3 years ago
- Collect Voice Conversion researches☆96Updated this week
- Audio generation using diffusion models, in PyTorch.☆2,096Jun 12, 2023Updated 2 years ago
- TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion☆148Jan 15, 2024Updated 2 years ago
- [NeurIPS 2024] Code, Dataset, Samples for the VATT paper “ Tell What You Hear From What You See - Video to Audio Generation Through Text”☆35Jul 24, 2025Updated 6 months ago
- This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.☆600Sep 18, 2023Updated 2 years ago
- [ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations☆142Apr 27, 2024Updated last year