LeonieWeissweiler / CISTEM
Stemmer for German
☆45Updated 2 years ago
Alternatives and similar repositories for CISTEM:
Users that are interested in CISTEM are comparing it to the libraries listed below
- small Java library for splitting German compound words☆63Updated 10 months ago
- A tokenizer and sentence splitter for German and English web and social media texts.☆139Updated 3 months ago
- Plan and train German transformer models.☆23Updated 4 years ago
- German part-of-speech dictionary☆43Updated last year
- Ten Thousand German News Articles Dataset for Topic Classification☆84Updated 2 years ago
- A lemmatizer for German language text☆88Updated 2 years ago
- Compound splitter for German☆104Updated 4 years ago
- Open German WordNet☆93Updated last year
- ☆18Updated 2 months ago
- German stopwords collection☆85Updated 2 years ago
- The Zurich Dependency Parser for German☆83Updated 2 years ago
- German Morphological Analyzer☆47Updated 3 years ago
- Toolkit to obtain and preprocess German text corpora, train models and evaluate them with generated testsets. Built with Gensim and Tenso…☆236Updated 7 months ago
- Named Entity Recognition data for Europeana Newspapers☆171Updated last year
- The home repository of the NerKor corpus, a Hungarian gold standard named entity annotated corpus containing 1 million tokens.☆15Updated last year
- GermaNER: Free Open German Named Entity Recognition Tool☆36Updated last year
- The Hanover Tagger - A simple approach to lemmatization and POS-tagging of German morphology based on heuristics and hidden markov models…☆51Updated last week
- A machine learning tool for fishing entities☆263Updated this week
- A part-of-speech tagger with support for domain adaptation and external resources.☆22Updated 2 years ago
- AmbiverseNLU: A Natural Language Understanding suite by Max Planck Institute for Informatics☆210Updated last year
- Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern stri…☆24Updated 2 years ago
- GermaParl: Corpus of Plenary Protocols of the German Bundestag (TEI Format)☆32Updated last year
- Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German☆469Updated 5 months ago
- German sentiment scores with SentiWS as extension for spaCy☆37Updated 2 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆37Updated 3 years ago
- ANNIS is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with…☆75Updated last month
- spaCy + UDPipe☆161Updated 2 years ago
- Python framework for processing Universal Dependencies data☆55Updated last week
- NLP framework: sentence detector, tokeniser, pos-tagger and dependency parser☆49Updated last year
- TextComplexityDE dataset consists of 1000 sentences in the German language with subjective complexity rating, collected from German learn…☆13Updated 2 years ago