34j/awesome-vits

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/34j/awesome-vits)

34j / awesome-vits

List of repositories relevant to VITS.

☆36

Alternatives and similar repositories for awesome-vits

Users that are interested in awesome-vits are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

talhanai / kaldi-diar-latte
View on GitHub
steps to perform text-based speaker diarization with kaldi toolkit
☆12Nov 2, 2018Updated 7 years ago
deepaudio / deepaudio-speaker
View on GitHub
neural network based speaker embedder
☆24Jan 7, 2023Updated 3 years ago
Adibian / Persian-MultiSpeaker-Tacotron2
View on GitHub
Implementation of Transfer Learning from Speaker Verification to Multi-speaker Text-To-Speech Synthesis (SV2TTS) in Persian language.
☆13Oct 2, 2025Updated 9 months ago
pkufool / simple-wer
View on GitHub
A simple command line tool to calculate WER for ASR.
☆14Oct 14, 2024Updated last year
exercise-book-yq / Supercodec
View on GitHub
☆51Mar 5, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
nateraw / voice-cloning
View on GitHub
Make Kanye sing any song ya want 🎤🔥
☆26Apr 25, 2023Updated 3 years ago
nii-yamagishilab / vctk-silence-labels
View on GitHub
☆25Oct 4, 2022Updated 3 years ago
ml-for-speech / speechtoolkit
View on GitHub
[Early Alpha] A unified framework for text-to-speech, voice conversion, automatic speech recognition, audio classification, voice activit…
☆22Jan 10, 2025Updated last year
ethman / tagbox
View on GitHub
Steer OpenAI's Jukebox with Music Taggers
☆42Apr 21, 2022Updated 4 years ago
bshall / dusted
View on GitHub
DUSTED: Spoken-Term Discovery using Discrete Speech Units
☆17Oct 2, 2024Updated last year
ex3ndr / supervoice-hybrid
View on GitHub
My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one
☆26Aug 5, 2024Updated last year
Aratako / CALM-DACVAE
View on GitHub
An attempt to reproduce CALM (Continuous Audio Language Models) using DACVAE as the audio VAE.
☆18Feb 20, 2026Updated 5 months ago
alumae / streaming-punctuator
View on GitHub
☆17Apr 14, 2023Updated 3 years ago
Chutlhu / dEchorate
View on GitHub
Da - ECHO - RetrievAl - daTasEt
☆36Jul 7, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
patrickvonplaten / Wav2Vec2_ParlanceCTCDecode
View on GitHub
☆11Nov 5, 2021Updated 4 years ago
AmphionTeam / FlexiCodec
View on GitHub
[ICLR2026] FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
☆50Jul 1, 2026Updated 3 weeks ago
idnavid / speech_activity_detection
View on GitHub
Unsupervised speech activity detection system.
☆11Jul 2, 2018Updated 8 years ago
gpu-poor / gramvaani_hindi_asr
View on GitHub
This repo contains the baseline model recipes and pre-trained model for GramVanni hindi ASR challenge
☆16Mar 26, 2022Updated 4 years ago
BUTSpeechFIT / MultiSV
View on GitHub
MultiSV: scripts for data preparation
☆31Jan 18, 2025Updated last year
pariajm / e2e-asr-and-disfluency-removal-evaluator
View on GitHub
A new metric for evaluating end-to-end speech recognition and disfluency removal systems
☆19Mar 7, 2021Updated 5 years ago
zhu-han / SpeechLLM
View on GitHub
LLM-based ASR recipe with Zipformer encoder and Qwen LLM
☆35Sep 25, 2025Updated 10 months ago
alpoktem / Prosograph
View on GitHub
A Visualizer for prosodically annotated speech corpora
☆12Oct 27, 2021Updated 4 years ago
miccio-dk / NISQA
View on GitHub
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
☆16Apr 13, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
AndreevP / speech_distances
View on GitHub
Deep Speech Distances PyTorch
☆29Feb 21, 2022Updated 4 years ago
freds0 / kabooks
View on GitHub
KABooks is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. Using a…
☆13Mar 24, 2023Updated 3 years ago
Maitreyapatel / speech-conversion-between-different-modalities
View on GitHub
Generative Adversarial Networks for different impaired speech conversions
☆39Jul 6, 2023Updated 3 years ago
PecholaL / MAIN-VC
View on GitHub
Lightweight Speech Representation Learning for One-Shot Voice Conversion
☆23Dec 12, 2024Updated last year
kyegomez / AudioFlamingo
View on GitHub
Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…
☆39Jan 27, 2025Updated last year
asappresearch / multistream-cnn
View on GitHub
Multistream CNN for Robust Acoustic Modeling
☆40Jun 17, 2021Updated 5 years ago
rwth-i6 / i6_core
View on GitHub
Sisyphus recipies for ASR
☆19Jul 17, 2026Updated last week
choiHkk / Transformer-TTS-V2
View on GitHub
☆25Mar 6, 2024Updated 2 years ago
unilight / s3prl-vc
View on GitHub
S3PRL-VC: A Voice Conversion Toolkit based on S3PRL
☆101Mar 15, 2026Updated 4 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Daniel-W-Blender-Python / VIPER-Blender-Mocap
View on GitHub
A Blender Addon that uses István Sárándi's Metrabs 3D pose estimation model in Blender to capture a person's movements, and apply them to…
☆18Mar 29, 2026Updated 3 months ago
xkx-hub / KALL-E
View on GitHub
[AAAI 2026 oral] KALL-E:Autoregressive Speech Synthesis with Next-Distribution Prediction
☆42Sep 25, 2025Updated 10 months ago
OmniMMI / OpenOmniNexus
View on GitHub
a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.
☆38Apr 7, 2025Updated last year
haoheliu / DCASE_2022_Task_5
View on GitHub
System that ranks 2nd in DCASE 2022 Challenge Task 5: Few-shot Bioacoustic Event Detection
☆28Jul 6, 2022Updated 4 years ago
bookbot-hive / k2-indonesian-asr
View on GitHub
Indonesian speech/phoneme recognizer powered by Kaldi 2.0 (lhotse, icefall, sherpa).
☆16Jun 30, 2023Updated 3 years ago
inconnu11 / Objective-evaluation_speech_synthesis
View on GitHub
☆17Mar 24, 2022Updated 4 years ago
DongKeon / webrtc-whisper-asr
View on GitHub
WebRTC-based real-time audio streaming with Faster Whisper ASR integration for live speech-to-text transcription.
☆13Sep 27, 2024Updated last year