feldberlin / timething
Timething is a library for aligning text transcripts with their audio recordings.
☆114Updated 3 months ago
Alternatives and similar repositories for timething:
Users that are interested in timething are comparing it to the libraries listed below
- Python forced alignment☆86Updated 10 months ago
- Simple Diarization model☆47Updated last year
- Zero-shot multimodal punctuation insertion and truecasing using Whisper☆109Updated 2 years ago
- 🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.☆234Updated 8 months ago
- Text to speech alignment using CTC forced alignment☆223Updated last week
- Grapheme-to-Phoneme transductions that preserve input and output indices, and support cross-lingual g2p!☆148Updated last month
- Python library for manipulating pronunciations using the International Phonetic Alphabet (IPA)☆86Updated last year
- phoneme tokenizer and grapheme-to-phoneme model for 8k languages☆156Updated last year
- Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code☆145Updated 10 months ago
- Segment an audio file and obtain utterance alignments. (Python package)☆330Updated 9 months ago
- ☆80Updated 9 months ago
- The EveryVoice TTS Toolkit - Text To Speech for your language☆24Updated 2 weeks ago
- Google's SoundStorm: Efficient Parallel Audio Generation☆131Updated last year
- SelfRemaster: SSL Speech Restoration☆88Updated last year
- Collection of pretrained models for the Montreal Forced Aligner☆131Updated 7 months ago
- A sequence-to-sequence voice conversion toolkit.☆93Updated 7 months ago
- A tokenizer, text cleaner, and phonemizer for many human languages.☆305Updated 3 months ago
- 🌼 Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition☆15Updated last year
- This is the implementation for "ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Rhythm"☆131Updated last year
- This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.☆69Updated 3 months ago
- ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations☆144Updated 11 months ago
- Synchronize Whisper's timestamps over an existing accurate transcription☆142Updated 9 months ago
- ☆33Updated 8 months ago
- Easy-to-Use Speech MOS predictors☆265Updated last year
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆80Updated last year
- Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context☆190Updated 5 months ago
- A python package for deep multilingual punctuation prediction.☆117Updated 6 months ago
- Application of MB-iSTFT-VITS components to vits2_pytorch☆122Updated 3 months ago
- Byte-based multilingual transformer TTS for low-resource/few-shot language adaptation.☆88Updated 2 years ago
- Go from raw audio files to a text-audio dataset automatically with OpenAI's Whisper.☆135Updated last year