lukerbs / forcealignLinks
ForceAlign is a Python library for forced alignment of English text to English audio. You can use ForceAlign to get word or phoneme level text alignments of audio, with each word or phoneme's start and end time within the audio. ForceAlign was designed to be easy to install and use, without requiring any third-party, non-Python dependencies.
☆25Updated last year
Alternatives and similar repositories for forcealign
Users that are interested in forcealign are comparing it to the libraries listed below
Sorting:
- An unofficial PyTorch implementation of VALL-E☆88Updated 6 months ago
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆106Updated last year
- ☆58Updated last year
- ☆106Updated 4 months ago
- X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion☆111Updated last year
- This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).☆94Updated 3 months ago
- Official Code for ParrotTTS☆58Updated last year
- Finetuning VITS Efficiently☆33Updated 2 years ago
- Putting flows on top of neural transducers for better TTS☆65Updated 2 weeks ago
- ☆55Updated 3 years ago
- a lightweight voice conversion☆86Updated last year
- Speaker change detection using SincNet and an LSTM/Transformer☆56Updated 8 months ago
- [WIP] Unofficial Implementation of Microsoft's PromptTTS2☆54Updated 2 years ago
- [IJCAI'23] Learning to Speak from Text for Low-Resource TTS☆63Updated 2 years ago
- Official implementation of the TTS model Lina-Speech☆176Updated last year
- Chinese and English Bilinguish G2P☆22Updated 2 years ago
- A Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS☆54Updated last year
- A fast speech-to-speech & speech-to-text translation model that supports simultaneous decoding and offers 28× speedup.☆76Updated last year
- Grapheme-to-Phoneme for Mixed Chinese (Mandarin or Cantonese) and English.☆114Updated 2 months ago
- The YouTube Text-To-Speech dataset is comprised of waveform audio extracted from YouTube videos alongside their English transcriptions☆52Updated 4 years ago
- Joint CTC-S2S Phoneme-level ASR for Voice Conversion and TTS (Text-Mel Alignment)☆124Updated 3 years ago
- Collection of scripts from mHuBERT-147.☆32Updated last year
- ☆23Updated last year
- Extract phoneme-level timestamps from speeh audio.☆114Updated 3 weeks ago
- ☆70Updated 2 years ago
- Official implementation for Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning☆96Updated last year
- audiolm-pytorch training code☆15Updated 2 years ago
- Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".☆25Updated last year
- All generative model in one for better TTS model☆74Updated last year
- ☆44Updated last year