jumon/whisper-punctuator

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jumon/whisper-punctuator)

jumon / whisper-punctuator

Zero-shot multimodal punctuation insertion and truecasing using Whisper

☆120

Alternatives and similar repositories for whisper-punctuator

Users that are interested in whisper-punctuator are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Aria-K-Alethia / laughter-synthesis
View on GitHub
Official implementation of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus" acc…
☆77Jul 16, 2023Updated 3 years ago
ndkgit339 / spe-dss
View on GitHub
Speech Parameter Estimation Using Differentiable Speech Synthesizer
☆43May 9, 2023Updated 3 years ago
revsic / torch-diffusion-wavegan
View on GitHub
Parallel waveform generation with DiffusionGAN
☆17Mar 26, 2022Updated 4 years ago
p1an-lin-jung / wv_tts
View on GitHub
☆19Mar 22, 2024Updated 2 years ago
tts-tutorial / interspeech2022
View on GitHub
☆162Sep 19, 2022Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Takaaki-Saeki / zm-text-tts
View on GitHub
[IJCAI'23] Learning to Speak from Text for Low-Resource TTS
☆65May 30, 2023Updated 3 years ago
JusperLee / Gull-Codec-Training
View on GitHub
☆12Mar 11, 2025Updated last year
sp-nitech / diffsptk
View on GitHub
A differentiable version of SPTK
☆201Jul 14, 2026Updated 2 weeks ago
nvidia-riva / riva-asrlib-decoder
View on GitHub
Standalone implementation of the CUDA-accelerated WFST Decoder available in Riva
☆91Feb 18, 2025Updated last year
ryanrudes / YTTTS
View on GitHub
The YouTube Text-To-Speech dataset is comprised of waveform audio extracted from YouTube videos alongside their English transcriptions
☆53Apr 1, 2021Updated 5 years ago
IDEA-Emdoor-Lab / DistilCodec
View on GitHub
A Neural Audio Codec (NAC) for Universal Audio
☆47May 30, 2025Updated last year
miccio-dk / NISQA
View on GitHub
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
☆16Apr 13, 2022Updated 4 years ago
besacier / ASR2022
View on GitHub
☆57Dec 19, 2022Updated 3 years ago
yl4579 / PitchExtractor
View on GitHub
Deep Neural Pitch Extractor for Voice Conversion and TTS Training
☆152Aug 22, 2022Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
maxrmorrison / psola
View on GitHub
Pitch-shifting and time-stretching with TD-PSOLA
☆90Aug 16, 2023Updated 2 years ago
SonyResearch / VRVQ
View on GitHub
Variable Bitrate Residual Vector Quantization for Audio Coding
☆54May 1, 2025Updated last year
haiciyang / LaDiffCodec
View on GitHub
ICASSP 2024 - Generative De-Quantization for Neural Speech Codec via Latent Diffusion.
☆56Nov 16, 2025Updated 8 months ago
MiscellaneousStuff / PhoneLM
View on GitHub
(R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.
☆48Sep 4, 2023Updated 2 years ago
line / LibriTTS-P
View on GitHub
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
☆161Jun 13, 2024Updated 2 years ago
Srijith-rkr / Whispering-LLaMA
View on GitHub
EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction
☆271May 19, 2024Updated 2 years ago
MiniXC / LightningFastSpeech2
View on GitHub
☆55Jan 13, 2023Updated 3 years ago
jumon / whisper-finetuning
View on GitHub
[WIP] Scripts for fine-tuning Whisper
☆221Jul 2, 2026Updated 3 weeks ago
chomeyama / SiFiGAN
View on GitHub
Official implementation of the source-filter HiFiGAN vocoder
☆275Jul 29, 2023Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
lumaku / ctc-segmentation
View on GitHub
Segment an audio file and obtain utterance alignments. (Python package)
☆348May 15, 2024Updated 2 years ago
huggingface / speechbox
View on GitHub
☆358Mar 17, 2024Updated 2 years ago
tarepan / VoiceConversionLab
View on GitHub
Collect Voice Conversion researches
☆97Updated this week
yukara-ikemiya / Open-Miipher-2
View on GitHub
PyTorch implementation of Miipher-2 [2025] which is a speech restoration model by Google DeepMind
☆70Sep 22, 2025Updated 10 months ago
ex3ndr / supervoice-gpt
View on GitHub
GPT-style network for phonemization with durations of text
☆68Mar 21, 2024Updated 2 years ago
supertone-inc / super-monotonic-align
View on GitHub
☆173Sep 19, 2024Updated last year
imdanboy / jets
View on GitHub
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech
☆111Jun 6, 2022Updated 4 years ago
winddori2002 / TriAAN-VC
View on GitHub
TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion
☆146Jan 15, 2024Updated 2 years ago
AlanBaade / SyllableLM
View on GitHub
Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models
☆63Jul 1, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
X-LANCE / StoryTTS
View on GitHub
[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
☆141Apr 27, 2024Updated 2 years ago
seastar105 / pflow-encodec
View on GitHub
Implementation of TTS model based on NVIDIA P-Flow TTS Paper
☆77Jul 13, 2026Updated 2 weeks ago
EndlessReform / smoltts
View on GitHub
Open TTS models, built for streaming on the edge
☆45Mar 16, 2025Updated last year
neosapience / editts
View on GitHub
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech (INTERSPEECH 2022)
☆122Jan 24, 2023Updated 3 years ago
NVIDIA / radtts
View on GitHub
Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, …
☆291Apr 6, 2023Updated 3 years ago
yukara-ikemiya / wavefit-pytorch
View on GitHub
PyTorch implementation of WaveFit [2022, Google] which is one of SOTA lightweight/fast speech vocoders.
☆70Jul 13, 2026Updated 2 weeks ago
rishikksh20 / Avocodo-pytorch
View on GitHub
Avocodo: Generative Adversarial Network for Artifact-free Vocoder
☆122Jul 14, 2022Updated 4 years ago