pengzhendong / Torchaudio-Forced-AlignerLinks

Torchaudio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.

☆11

Alternatives and similar repositories for Torchaudio-Forced-Aligner

Users that are interested in Torchaudio-Forced-Aligner are comparing it to the libraries listed below

Sorting:

Mddct / simple-tts
（WIP）long form speech generatoins
☆31Updated 2 months ago
Mddct / transformer-vocos
☆28Updated last month
p1an-lin-jung / wv_tts
☆19Updated last year
pengzhendong / audio-pipeline
☆21Updated 8 months ago
Mddct / cosyvoice2-flow-optimized
faster inference
☆28Updated 5 months ago
frankyoujian / Edge-Punct-Casing
☆28Updated 4 months ago
shivammehta25 / BetterFastSpeech2
Just another FastSpeech 2 but cleaner code :)
☆26Updated 11 months ago
rishikksh20 / MiniMax-TTS-pytorch
Try to replicate the architecture of MiniMaxTTS mentioned in it's technical report
☆33Updated last month
huutuongtu / Lightvoc
LIGHTVOC AN UPSAMPLING-FREE GAN VOCODER BASED ON CONFORMER AND INVERSE SHORT-TIME FOURIER TRANSFORM
☆16Updated last year
lifeiteng / Aligner-SUPERB
Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmark
☆28Updated last month
pengzhendong / wetext
Python runtime for WeTextProcessing (does not depend on Pynini)
☆12Updated 3 months ago
amphionspace / tts-evaluation
An evaluation set for large-scale trained TTS models (Coming in Sep 2024)
☆12Updated 9 months ago
xiaomi-research / dasheng-denoiser
Official PyTorch inference code for the Interspeech 2025 paper: Efficient Speech Enhancement via Embeddings from Pre-trained Generative A…
☆38Updated last week
OlaWod / PitchVC
PitchVC: Pitch Conditioned Any-to-Many Voice Conversion
☆34Updated last year
reppy4620 / convnext_tts
Unofficial implementation of ConvNeXt-TTS powered by lightning
☆17Updated 8 months ago
jisang93 / VISinger
Unofficial pytorch implementation of VISinger: Variational Inference with Adversarial Learning for End-to-end Singing Voice Synthesis (IC…
☆15Updated 2 years ago
ex3ndr / supervoice-hybrid
My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one
☆27Updated 10 months ago
MiscellaneousStuff / PhoneLM
(R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.
☆48Updated last year
liuhuang31 / Megatts2_HierSpeechpp
Megatts2 use HierSpeechpp's vocoder
☆18Updated 6 months ago
ozspeech / OZSpeech
[ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching
☆36Updated 4 months ago
IDEA-Emdoor-Lab / DistilCodec
A Neural Audio Codec (NAC) for Universal Audio
☆37Updated 3 weeks ago
ductuantruong / speaker_age_estimation_ssl_study
Official implementation of the APSIPA 2022 paper: Exploring Speaker Age Estimation on Different Self-Supervised Learning Models
☆14Updated 2 years ago
ishine / Mutiband-HIFIGAN
Mutiband version of HIFIGAN
☆18Updated 4 years ago
wenet-e2e / WeSpeech-AI
Open Source Speech/Text Data on AI
☆18Updated 2 years ago
shang0712 / HierTTS
☆45Updated 2 years ago
choiHkk / Transformer-TTS-V2
☆25Updated last year
IDEA-Emdoor-Lab / UniTTS
A TTS Trained on Universal Audio.
☆34Updated 2 weeks ago
xinshengwang / robpitch
A pitch detection model trained to be robust against noise and reverberation environments.
☆26Updated 5 months ago
liuhuang31 / g2pw_once
G2pw's inference speed is accelerated by about 8-10 times. Change loop generated predictive data to only once and model loop prediction b…
☆14Updated last year
lmxue / ICASSP2022_TTS_VC_Summary
ICASSP2022 TTS&VC Summary
☆14Updated 3 years ago