Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern string search
☆35Jul 7, 2022Updated 3 years ago
Alternatives and similar repositories for german_compound_splitter
Users that are interested in german_compound_splitter are comparing it to the libraries listed below
Sorting:
- Compound splitter for German☆112Apr 5, 2020Updated 5 years ago
- Adnabod lleferydd Cymraeg i'r Gymraeg gyda HuggingFace // Speech Recognition for Welsh with HuggingFace☆13Nov 29, 2022Updated 3 years ago
- Repo for the simplified text alignment tools.☆21Dec 4, 2020Updated 5 years ago
- Alignment and annotation for comparable documents.☆22Oct 16, 2018Updated 7 years ago
- Legal Reference Extraction☆43Feb 13, 2026Updated 2 weeks ago
- Python code to automatically produce a summary of a piece of text.☆12Sep 8, 2016Updated 9 years ago
- Python module to clean and transliterate (i.e. normalize) German text including abbreviations, numbers, timestamps etc. It can be used to…☆36Jan 16, 2021Updated 5 years ago
- Coqui Inference Engine☆40Aug 3, 2021Updated 4 years ago
- Website for the KGC 2020 Tutorial: "Building a Knowledge Graph from schema.org annotations"☆10Jun 26, 2020Updated 5 years ago
- Material for a course on Advanced NLP☆14Jul 22, 2025Updated 7 months ago
- Coding utilities for quantitative legal studies☆14Dec 7, 2025Updated 2 months ago
- ☆11Jan 27, 2026Updated last month
- Apertium linguistic data for Catalan☆11Updated this week
- XWikisCorpus, cross-lingual summarisation, multi-lingual summarisation, pre-trained language models, zero-shot and few-shot summarisation…☆10Nov 4, 2022Updated 3 years ago
- Using YouTube to prepare a speech recognition dataset for any language☆10Mar 30, 2021Updated 4 years ago
- Tool for creating Kaldi nnet3 recipes using the International Phonetic Alphabet (IPA)☆10Jun 2, 2021Updated 4 years ago
- Poetry Corpora Annotated on Aesthetic Emotions☆12Aug 2, 2022Updated 3 years ago
- Code for "Error-driven Fixed-Budget ASR Personalization for Accented Speakers" in ICASSP 2021☆11Jun 13, 2021Updated 4 years ago
- SMOR (Stuttgart Morphology) with alternative lemmatization component☆13Aug 10, 2023Updated 2 years ago
- Automatically exported from code.google.com/p/hunpos☆12Apr 9, 2018Updated 7 years ago
- Lyrics crawling, pre-processing, embedding generation, model training, and lyrics generation - all in one tool☆14Nov 4, 2018Updated 7 years ago
- wav2rtp is a simple tool intended to convert speech data from wav files to RTP data stream☆14Aug 15, 2021Updated 4 years ago
- Extension for pie to include taggers with their models and pre/postprocessors☆11May 30, 2024Updated last year
- IPA Phonetic dataset lexicon☆18Feb 22, 2026Updated last week
- Ukrainian ELECTRA model☆12Mar 11, 2023Updated 2 years ago
- TSAR2022 Shared Task on Lexical Simplification - Datasets and Evaluation scripts☆10Oct 27, 2022Updated 3 years ago
- Wikipedia text corpus for self-supervised NLP model training☆46Jul 17, 2022Updated 3 years ago
- A neural network hyphenator for the German language☆45Oct 25, 2023Updated 2 years ago
- Implementation of different noise embeddings for noise aware training of Kaldi acoustic models.☆13Feb 13, 2021Updated 5 years ago
- ☆11Dec 8, 2022Updated 3 years ago
- A small package for handy conversion of german numerals (also ordinal / signed) written as words to numbers.☆12Jan 22, 2026Updated last month
- Awesome stuff made by the Mycroft community☆13Sep 16, 2021Updated 4 years ago
- A DH abstracts conversion tool☆13Mar 18, 2025Updated 11 months ago
- A python library to generate highly realistic typos (fuzz-testing)☆13Mar 16, 2025Updated 11 months ago
- Common Lisp implementation of the Zipper data structure first described by Gerárd Huet.☆15Dec 21, 2017Updated 8 years ago
- ☆10Nov 1, 2025Updated 4 months ago
- Small projects using the OpenAI API.☆13Mar 21, 2025Updated 11 months ago
- Yet another heatmap generator for rtl_power csv file☆11Aug 24, 2025Updated 6 months ago
- PAVOQUE Corpus of Expressive Speech☆12Aug 2, 2016Updated 9 years ago