repodiac / german_transliterate
Python module to clean and transliterate (i.e. normalize) German text including abbreviations, numbers, timestamps etc. It can be used to clean messy text (e.g. map peculiar Unicode encodings to ASCII) or replace common abbreviations in text in combination with various text mining tasks.
☆32Updated 4 years ago
Alternatives and similar repositories for german_transliterate:
Users that are interested in german_transliterate are comparing it to the libraries listed below
- This is the official repository for the HUI-Audio-Corpus-German. The corresponding paper is in the process of publication. With the repo…☆27Updated last year
- Convert English text from written expressions into spoken forms☆24Updated 2 years ago
- ☆80Updated 8 months ago
- Incorporating KenLM language model with HuggingFace implementation of Wav2Vec2CTC Model using beam search decoding☆74Updated 3 years ago
- ☆38Updated 3 years ago
- Python wrappers for Kaldi Levenshtein's distance and alignment code.☆62Updated 11 months ago
- Segment a given audio into utterances using a trained end-to-end ASR model.☆73Updated 4 years ago
- CML-TTS: A Multilingual Dataset for Speech Synthesis☆29Updated 6 months ago
- Adnabod lleferydd Cymraeg i'r Gymraeg gyda HuggingFace // Speech Recognition for Welsh with HuggingFace☆14Updated 2 years ago
- This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to…☆43Updated 3 years ago
- multilingual speech aligner☆72Updated last year
- Putting flows on top of neural transducers for better TTS☆62Updated this week
- Linguistic processing for Common Voice☆53Updated last year
- This is the M-AILABS Speech Dataset☆41Updated 2 months ago
- Python wrapper for kaldi's arpa2fst☆38Updated 2 months ago
- ☆34Updated 5 months ago
- [IJCAI'23] Learning to Speak from Text for Low-Resource TTS☆63Updated last year
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆93Updated 4 months ago
- Some fast-ish algorithms for batch text search in moderate-sized collections, intended for data cleanup☆64Updated 5 months ago
- ☆111Updated 2 years ago
- Finetuning VITS Efficiently☆32Updated last year
- Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing☆69Updated 2 years ago
- Speaker change detection using SincNet and an LSTM/Transformer☆46Updated 7 months ago
- Deep Neural Pitch Extractor for Voice Conversion and TTS Training☆121Updated 2 years ago
- Dataset of ICASSP 2021 MULTILINGUAL PHONETIC DATASET FOR LOW RESOURCE SPEECH RECOGNITION☆40Updated last year
- ☆56Updated 2 years ago
- A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.☆101Updated last year
- ☆62Updated 9 months ago
- S3PRL-VC: A Voice Conversion Toolkit based on S3PRL☆98Updated 7 months ago
- scripts to align a given wave to its transcription using trained models by Kaldi☆32Updated 5 years ago