repodiac / german_compound_splitter
Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern string search
☆22Updated 2 years ago
Related projects: ⓘ
- Compound splitter for German☆102Updated 4 years ago
- SIGMORPHON 2022 Shared Task on Morpheme Segmentation☆23Updated last year
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆36Updated last year
- Python Finite-State Toolkit☆39Updated last month
- Open German WordNet☆87Updated 7 months ago
- UIMA CAS processing library written in Python☆84Updated 4 months ago
- Small-vocabulary sequence-to-sequence generation with optional feature conditioning☆29Updated last week
- A tokenizer and sentence splitter for German and English web and social media texts.☆135Updated last month
- ☆42Updated last month
- Catalan bert model☆12Updated 3 years ago
- German Morphological Analyzer☆45Updated 2 years ago
- This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings l…☆22Updated 2 years ago
- A small package for handy conversion of german numerals (also ordinal / signed) written as words to numbers.☆12Updated last year
- Named Entity Recognition (LSTM + CRF + FastText) with models for [historic] German☆26Updated 3 years ago
- A part-of-speech tagger with support for domain adaptation and external resources.☆22Updated last year
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆148Updated 3 months ago
- Compiled tools, datasets, and other resources for historical text normalization.☆16Updated 5 years ago
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆27Updated last year
- Plan and train German transformer models.☆22Updated 3 years ago
- GlotScript: A Resource and Tool for Low Resource Writing System Identification -- LREC 2024☆13Updated 3 months ago
- Efficient Low-Memory Aligner☆135Updated 2 weeks ago
- coFR: COreference resolution tool for FRench (and singletons).☆24Updated 4 years ago
- A neural dependency parser that does its best☆13Updated last week
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆139Updated last month
- ☆61Updated 4 months ago
- Curriculum training☆15Updated this week
- BERT and ELECTRA models trained on Europeana Newspapers☆35Updated 2 years ago
- Bicleaner fork that uses neural networks☆37Updated last month
- A Language-Independent Unsupervised Morphological Segmentation Framework based on Adaptor Grammars☆15Updated 3 months ago
- Deutsches Lyrik Korpus (DLK) / German Poetry Corpus☆17Updated 3 months ago