ArenAcikgoz/Whisper-Alignment

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ArenAcikgoz/Whisper-Alignment)

ArenAcikgoz / Whisper-Alignment

Forced alignment decoder for Whisper.

☆16

Alternatives and similar repositories for Whisper-Alignment

Users that are interested in Whisper-Alignment are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NTIA / alignnet
View on GitHub
Train no-reference speech quality estimators with multiple datasets via learned, per-dataset alignments.
☆18Aug 1, 2025Updated 11 months ago
primepake / F5-TTS-meanflow-multilingual
View on GitHub
Meanflow and multilingual for F5-TTS model
☆16Aug 23, 2025Updated 11 months ago
Mddct / simple-tts
View on GitHub
（WIP）long form speech generatoins
☆30Apr 2, 2025Updated last year
Mddct / transformer-vocos
View on GitHub
☆35Sep 6, 2025Updated 10 months ago
yoongi43 / VRVQ
View on GitHub
Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"
☆11Apr 10, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
Tikai7 / DiTTO-TTS
View on GitHub
DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors
☆39Feb 11, 2025Updated last year
llm-lab-org / CLASP
View on GitHub
CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval
☆13Jun 27, 2025Updated last year
jhuang448 / MultilingualALT
View on GitHub
Repo of the paper "Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model""
☆15Jun 28, 2024Updated 2 years ago
JusperLee / Gull-Codec-Training
View on GitHub
☆12Mar 11, 2025Updated last year
kjw11 / Speaker-Aware-CTC
View on GitHub
Speaker-aware CTC (SACTC) for multi-talker overlapped speech recognition.
☆22May 26, 2025Updated last year
google-deepmind / librispeech-long
View on GitHub
LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …
☆99Dec 28, 2024Updated last year
utter-project / mHuBERT-147-scripts
View on GitHub
Collection of scripts from mHuBERT-147.
☆35Nov 19, 2024Updated last year
lucadellalib / discrete-wavlm-codec
View on GitHub
A neural speech codec based on discrete WavLM representations
☆26Aug 28, 2024Updated last year
fluxions-ai / stftvae
View on GitHub
Inference for the STFT-VAE continuous audio codec (24kHz, 3.125Hz latent)
☆43Jul 12, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
corticph / error-align
View on GitHub
Text-to-text alignment algorithm for speech recognition error analysis.
☆32Jun 23, 2026Updated last month
mcf330 / efts2code
View on GitHub
source code of EfficientTTS 2
☆21Feb 18, 2024Updated 2 years ago
Mddct / cosyvoice2-flow-optimized
View on GitHub
faster inference
☆27Jan 20, 2025Updated last year
LAION-AI / emotional-speech-annotations
View on GitHub
This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models
☆35Oct 13, 2024Updated last year
reppy4620 / convnext_tts
View on GitHub
Unofficial implementation of ConvNeXt-TTS powered by lightning
☆18Oct 20, 2024Updated last year
aqtq314 / VogenSVS
View on GitHub
☆15Apr 16, 2026Updated 3 months ago
lifeiteng / NotebookTTS
View on GitHub
Text-To-Speech for NotebookLM
☆39Jul 20, 2025Updated last year
mubtasimahasan / DM-Codec
View on GitHub
Source code for the EMNLP 2025 paper “DM-Codec: Distilling Multimodal Representations for Speech Tokenization”
☆57Jun 1, 2025Updated last year
duerig / StyleTTS2
View on GitHub
StyleTTS 2 Optimized Training Fork
☆32Feb 2, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
fakerybakery / utmos
View on GitHub
A toolkit to calculate speech audio quality. Not affiliated with the original authors
☆74Aug 13, 2024Updated last year
Chengyuann / AutoStyle-TTS
View on GitHub
Official PyTorch implementation of (ICME2025 oral) "AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-…
☆26Feb 1, 2026Updated 5 months ago
walker-hyf / NCSSD
View on GitHub
Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)
☆61Nov 1, 2024Updated last year
ORI-Muchim / Efficient-Speech
View on GitHub
Lightweight Korean TTS Model based on FastSpeech2
☆15Mar 4, 2026Updated 4 months ago
cantabile-kwok / vec2wav2.0
View on GitHub
Code for vec2wav 2.0, a speech token vocoder for VC. Paper: https://arxiv.org/abs/2409.01995
☆79Dec 3, 2024Updated last year
freds0 / free-svc
View on GitHub
[ICASSP 2025] FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion
☆95Jul 23, 2025Updated last year
k2-fsa / kaldi-decoder
View on GitHub
Decoders from Kaldi using OpenFst
☆35Apr 10, 2026Updated 3 months ago
Picovoice / text-to-speech-benchmark
View on GitHub
Text-to-Speech Benchmark
☆26Apr 2, 2026Updated 3 months ago
choiHkk / Transformer-TTS-V2
View on GitHub
☆25Mar 6, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Zhongxu-Wang / ArtSpeech
View on GitHub
ArtSpeech: Adaptive Text-to-Speech Synthesis with Articulatory Representations
☆22Sep 21, 2025Updated 10 months ago
ZhikangNiu / A-DMA
View on GitHub
[INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"
☆67Jun 16, 2025Updated last year
thuhcsi / VoxInstruct
View on GitHub
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
☆100Nov 9, 2024Updated last year
KdaiP / DC-Speech-VAE
View on GitHub
5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs
☆57Nov 19, 2025Updated 8 months ago
R1ckShi / FrontEnd-AEC
View on GitHub
Acoustic echo cancelation(AEC) is a main algorithm in the pipe line of acoustic devices with KWS or ASR. FNLMS is used.
☆19Apr 22, 2019Updated 7 years ago
ozspeech / OZSpeech
View on GitHub
[ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching
☆45Feb 9, 2025Updated last year
ScottishFold007 / TTSAudioNormalizer
View on GitHub
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loud…
☆112Dec 20, 2024Updated last year