lukerbs / forcealignLinks
ForceAlign is a Python library for forced alignment of English text to English audio. You can use ForceAlign to get word or phoneme level text alignments of audio, with each word or phoneme's start and end time within the audio. ForceAlign was designed to be easy to install and use, without requiring any third-party, non-Python dependencies.
☆15Updated 6 months ago
Alternatives and similar repositories for forcealign
Users that are interested in forcealign are comparing it to the libraries listed below
Sorting:
- GPT for FACodec☆13Updated last year
- ☆56Updated 2 years ago
- A collection of all our phonemeizers for dataset construction and inference☆23Updated 3 months ago
- Incorporating AutoVocoder to MB-iSTFT-VITS☆48Updated 2 years ago
- Text-To-Speech for NotebookLM☆29Updated 5 months ago
- Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmark☆27Updated 3 weeks ago
- ☆20Updated 7 months ago
- Generate audio datasets for training Text-To-Speech models, through smart audio splitting with silence detection, and transcription using…☆28Updated 2 years ago
- ☆12Updated 2 years ago
- ☆13Updated 9 months ago
- speaker-disentangled speech linguistic content quantizer☆16Updated 2 months ago
- Pushing the Limits of Zero-shot End-to-End Speech Translation☆25Updated 5 months ago
- Official Code for ParrotTTS☆51Updated 7 months ago
- ☆15Updated 2 months ago
- Conformer block with Rotary Position Embedding, modified from lucidrains' implement☆13Updated 8 months ago
- ☆18Updated last year
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆71Updated 7 months ago
- Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automat…☆33Updated 11 months ago
- A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts☆14Updated 6 months ago
- DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors☆24Updated 3 months ago
- NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment☆16Updated 3 years ago
- An AR+AR TTS attempt.☆16Updated 4 months ago
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one☆27Updated 9 months ago
- Just another FastSpeech 2 but cleaner code :)☆26Updated 11 months ago
- text to speech☆10Updated last year
- Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".☆24Updated 11 months ago
- SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.☆13Updated 2 years ago
- ☆10Updated 6 months ago
- Official code for "EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting"☆31Updated 3 weeks ago
- ☆57Updated 11 months ago