KoMyeongJin/SpecDiff-GAN

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/KoMyeongJin/SpecDiff-GAN)

KoMyeongJin / SpecDiff-GAN

Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS

☆40

Alternatives and similar repositories for SpecDiff-GAN

Users that are interested in SpecDiff-GAN are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

0913ktg / SC_VALL-E
View on GitHub
Style-Controllable Zero-Shot Text to Speech Synthesizer based on VALL-E
☆136Oct 23, 2024Updated last year
seongho608 / RingFormer
View on GitHub
☆52Jun 24, 2025Updated last year
0913ktg / TGSC_LSTM-TCN-GANs
View on GitHub
☆17Jan 6, 2022Updated 4 years ago
shivammehta25 / BetterFastSpeech2
View on GitHub
Just another FastSpeech 2 but cleaner code :)
☆29Jun 28, 2024Updated 2 years ago
p0p4k / Matcha-TTS-2
View on GitHub
E2E TTS using Conditional Flow Matching (Experimental*)
☆71Nov 10, 2023Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
keonlee9420 / DiffGAN-TTS
View on GitHub
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
☆350Feb 21, 2022Updated 4 years ago
dinhoitt / BemaGANv2
View on GitHub
☆22Mar 3, 2026Updated 4 months ago
0913ktg / 5G-Traffic-Generator
View on GitHub
☆22Jul 26, 2023Updated 3 years ago
zengchang233 / CrossSinger
View on GitHub
The source code for the paper CrossSinger (asru2023)
☆18Oct 12, 2023Updated 2 years ago
thuhcsi / SnakeGAN
View on GitHub
Please visit https://thuhcsi.github.io/SnakeGAN/
☆37Apr 25, 2023Updated 3 years ago
lakahaga / dc-comix-tts
View on GitHub
Implementation of DCComix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer
☆74Aug 21, 2023Updated 2 years ago
OlaWod / PitchVC
View on GitHub
PitchVC: Pitch Conditioned Any-to-Many Voice Conversion
☆35Jun 6, 2024Updated 2 years ago
rishikksh20 / NU-Wave2-pytorch
View on GitHub
NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates [WIP]
☆25Jul 5, 2022Updated 4 years ago
RickyL-2000 / AlignSTS
View on GitHub
Findings of ACL 2023 | AlignSTS: a speech-to-singing (STS) model based on modality disentanglement and cross-modal alignment
☆68Jul 5, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
alsgur9368 / FM-Singer
View on GitHub
☆19Jan 19, 2026Updated 6 months ago
justinjohn0306 / SpeedScribe
View on GitHub
High-performance ASR tool using Faster Whisper, supporting custom models, multi-language transcription, and real-time processing feedback…
☆10Sep 17, 2025Updated 10 months ago
yl4579 / StyleTTS-VC
View on GitHub
Official Implementation of StyleTTS-VC
☆200Jan 14, 2025Updated last year
exercise-book-yq / FreeCodec
View on GitHub
FREECODEC: A DISENTANGLED NEURAL SPEECH CODEC WITH FEWER TOKENS
☆24Sep 9, 2024Updated last year
Scarfmonster / HiFiPLN
View on GitHub
Multispeaker Community Vocoder Model for DiffSinger
☆39Aug 11, 2025Updated 11 months ago
naver-ai / facetts
View on GitHub
☆61May 17, 2023Updated 3 years ago
yl4467 / singer
View on GitHub
☆15Feb 22, 2025Updated last year
xinshengwang / robpitch
View on GitHub
A pitch detection model trained to be robust against noise and reverberation environments.
☆27Jan 21, 2025Updated last year
zhenye234 / CoMoSpeech
View on GitHub
ACM MM 2023 CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
☆214Apr 26, 2024Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
Adibian / ResGrad
View on GitHub
Unofficial implementation of ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech
☆20Feb 9, 2025Updated last year
hcy71o / SNAC
View on GitHub
Unofficial Pytorch implementation of SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speake…
☆57Aug 7, 2023Updated 2 years ago
audiodemo / voice-conversion
View on GitHub
Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks
☆17Aug 18, 2023Updated 2 years ago
wetdog / wavenext_pytorch
View on GitHub
Unofficial implementation of wavenext vocoder
☆59Aug 28, 2024Updated last year
sarulab-speech / msr-utmos
View on GitHub
sampling frequency independent convolution for MOS prediction
☆15Jul 22, 2025Updated last year
0nutation / USLM
View on GitHub
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
☆152Sep 14, 2023Updated 2 years ago
qiuk2 / AAR
View on GitHub
[Official Implementation] Acoustic Autoregressive Modeling 🔥
☆74Aug 24, 2024Updated last year
reppy4620 / vocoders
View on GitHub
My vocoder experiments
☆31Jul 26, 2025Updated last year
PlayVoice / VI-SVC
View on GitHub
VI-SVC model is just VITS without MAS and DurationPredictor.
☆10Nov 9, 2023Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
dmse4tts / DMSE4TTS
View on GitHub
☆24May 6, 2025Updated last year
AI-S2-Lab / EmoPP
View on GitHub
[NCMMSC'2024] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech
☆22Aug 20, 2024Updated last year
WangHelin1997 / DuTa-VC
View on GitHub
Source code and demo for INTERPSEECH 2023 paper: DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion P…
☆38Dec 5, 2023Updated 2 years ago
bshall / urhythmic
View on GitHub
Unsupervised Rhythm Modeling for Voice Conversion
☆85Aug 3, 2023Updated 2 years ago
zds-potato / multilingual-phonetic-sv
View on GitHub
☆10Dec 22, 2023Updated 2 years ago
KdaiP / StableTTS
View on GitHub
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
☆437Sep 13, 2024Updated last year
sarulab-speech / UTMOS22
View on GitHub
UT-Sarulab MOS prediction system using SSL models
☆308Apr 11, 2024Updated 2 years ago