lukerbs / forcealignLinks
ForceAlign is a Python library for forced alignment of English text to English audio. You can use ForceAlign to get word or phoneme level text alignments of audio, with each word or phoneme's start and end time within the audio. ForceAlign was designed to be easy to install and use, without requiring any third-party, non-Python dependencies.
☆16Updated 6 months ago
Alternatives and similar repositories for forcealign
Users that are interested in forcealign are comparing it to the libraries listed below
Sorting:
- ☆56Updated 2 years ago
- GPT for FACodec☆13Updated last year
- Official Code for ParrotTTS☆51Updated 8 months ago
- A collection of all our phonemeizers for dataset construction and inference☆24Updated 4 months ago
- SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.☆13Updated 2 years ago
- Text-To-Speech for NotebookLM☆32Updated 6 months ago
- ☆19Updated last year
- ☆21Updated 8 months ago
- ☆13Updated 10 months ago
- ☆13Updated last year
- ☆57Updated 11 months ago
- 'Grad-TTS' with Multilingual Cleaners☆10Updated last year
- [IJCAI'23] Learning to Speak from Text for Low-Resource TTS☆63Updated 2 years ago
- Pushing the Limits of Zero-shot End-to-End Speech Translation☆25Updated 6 months ago
- VoiceBank-2023 is the speech corpus specially designed for constructing personalized Mandarin text-to-speech (TTS) systems.☆39Updated last year
- ☆28Updated 4 months ago
- Just another FastSpeech 2 but cleaner code :)☆26Updated 11 months ago
- Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmark☆28Updated last month
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one☆27Updated 10 months ago
- This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.☆17Updated 7 months ago
- Collection of scripts from mHuBERT-147.☆27Updated 7 months ago
- Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clusterin…☆52Updated 2 years ago
- Torch implementation of Whisper-guided DDPM based Voice Conversion☆49Updated 2 years ago
- ☆33Updated 3 years ago
- Official code for "EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting"☆46Updated last month
- NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment☆16Updated 3 years ago
- An implementation of Charactr, Inc's "WavThruVec: Latent speech representation as intermediate features for neural speech synthesis"☆28Updated last year
- Generate audio datasets for training Text-To-Speech models, through smart audio splitting with silence detection, and transcription using…☆28Updated 2 years ago
- Decoders from Kaldi using OpenFst☆29Updated 5 months ago
- The YouTube Text-To-Speech dataset is comprised of waveform audio extracted from YouTube videos alongside their English transcriptions☆51Updated 4 years ago