repodiac / german_transliterate
Python module to clean and transliterate (i.e. normalize) German text including abbreviations, numbers, timestamps etc. It can be used to clean messy text (e.g. map peculiar Unicode encodings to ASCII) or replace common abbreviations in text in combination with various text mining tasks.
☆30Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for german_transliterate
- ☆62Updated 6 months ago
- This is the official repository for the HUI-Audio-Corpus-German. The corresponding paper is in the process of publication. With the repo…☆26Updated last year
- Speech-MASSIVE is a multilingual Spoken Language Understanding (SLU) dataset comprising the speech counterpart for a portion of the MASSI…☆19Updated 2 months ago
- Python wrappers for Kaldi Levenshtein's distance and alignment code.☆61Updated 8 months ago
- Dataset of ICASSP 2021 MULTILINGUAL PHONETIC DATASET FOR LOW RESOURCE SPEECH RECOGNITION☆37Updated last year
- Grapheme-to-Phoneme transductions that preserve input and output indices, and support cross-lingual g2p!☆135Updated this week
- A toolkit to calculate speech audio quality. Not affiliated with the original authors☆39Updated 3 months ago
- This is the M-AILABS Speech Dataset☆22Updated 4 months ago
- multilingual speech aligner☆72Updated last year
- Incorporating KenLM language model with HuggingFace implementation of Wav2Vec2CTC Model using beam search decoding☆71Updated 3 years ago
- ☆77Updated 5 months ago
- ☆32Updated 2 months ago
- ☆17Updated last year
- 56 language, 1 model Multilingual ASR☆24Updated 3 years ago
- ☆32Updated 2 months ago
- ☆74Updated 3 years ago
- Some fast-ish algorithms for batch text search in moderate-sized collections, intended for data cleanup☆58Updated 2 months ago
- A sequence-to-sequence voice conversion toolkit.☆86Updated 4 months ago
- Pronunciation-assisted Subword Modeling☆29Updated 5 years ago
- ☆33Updated last year
- Speaker change detection using SincNet and an LSTM/Transformer☆44Updated 4 months ago
- Segment a given audio into utterances using a trained end-to-end ASR model.☆73Updated 4 years ago
- VoicePAT is a modular and efficient toolkit for voice privacy research, with main focus on speaker anonymization.☆46Updated 6 months ago
- phoneme tokenizer and grapheme-to-phoneme model for 8k languages☆144Updated last year
- Reproducible experimental protocols for multimedia (audio, video, text) database☆84Updated last month
- Adnabod lleferydd Cymraeg i'r Gymraeg gyda HuggingFace // Speech Recognition for Welsh with HuggingFace☆14Updated last year
- Online streaming speaker change detection model in Pytorch☆36Updated last year
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆83Updated last month
- Convert English text from written expressions into spoken forms☆21Updated 2 years ago
- Zero-shot multimodal punctuation insertion and truecasing using Whisper☆99Updated last year