Toolkit to obtain and preprocess German text corpora, train models and evaluate them with generated testsets. Built with Gensim and Tensorflow.
☆242Aug 21, 2024Updated last year
Alternatives and similar repositories for GermanWordEmbeddings
Users that are interested in GermanWordEmbeddings are comparing it to the libraries listed below
Sorting:
- Language Model and Text Classification for German Language using Deep Learning☆18Jun 15, 2018Updated 7 years ago
- A lemmatizer for German language text☆94Feb 7, 2023Updated 3 years ago
- This is a german ELMo deep contextualized word representation. It is trained on a special German Wikipedia Text Corpus.☆28Dec 15, 2019Updated 6 years ago
- Any contributions to the NLTK project☆29May 8, 2014Updated 11 years ago
- Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German☆517Oct 30, 2024Updated last year
- GermaNER: Free Open German Named Entity Recognition Tool☆36Dec 16, 2023Updated 2 years ago
- Ten Thousand German News Articles Dataset for Topic Classification☆87Nov 7, 2022Updated 3 years ago
- Plan and train German transformer models.☆23Feb 22, 2021Updated 5 years ago
- Parser für die Plenarprotokolle des Bundestags☆21Jul 17, 2017Updated 8 years ago
- The Potsdam Twitter Sentiment Corpus☆18Jan 15, 2020Updated 6 years ago
- German lemmatization with IWNLP as extension for spaCy☆26Jul 28, 2023Updated 2 years ago
- Coreference resolution for German☆16Jun 26, 2017Updated 8 years ago
- Annotated data set consisting of user comments posted to a German-language newspaper website☆17Jun 28, 2018Updated 7 years ago
- This repository contains all manually labeled data from the GermEval-2018 shared task.☆29Sep 28, 2018Updated 7 years ago
- GermaParl: Corpus of Plenary Protocols of the German Bundestag (TEI Format)☆37Jun 1, 2023Updated 2 years ago
- This is a prototype of a semi-automatic data anonymization app for German documents.☆23Mar 6, 2023Updated 2 years ago
- This is a prototype of a multi-lingual suite for named-entity recognition in Python.☆21Apr 25, 2024Updated last year
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago
- 📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF☆39Mar 8, 2022Updated 3 years ago
- I analysed online user comments on articles by German news publishers SPON, ZEIT, and Focus☆19Feb 3, 2018Updated 8 years ago
- Simple CORPORA list crawler☆10Dec 2, 2016Updated 9 years ago
- ☆11Jan 27, 2026Updated last month
- DBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models☆158Dec 6, 2022Updated 3 years ago
- Presentations & notebooks from our talks /workshops/meetups/etc☆24Mar 23, 2018Updated 7 years ago
- Slides and code examples for my talks☆26May 18, 2025Updated 9 months ago
- all-paths graph kernel for protein-protein interaction extraction☆12Apr 22, 2014Updated 11 years ago
- SMOR (Stuttgart Morphology) with alternative lemmatization component☆13Aug 10, 2023Updated 2 years ago
- Compound splitter for German☆112Apr 5, 2020Updated 5 years ago
- An unsupervised compound splitter☆42Oct 6, 2019Updated 6 years ago
- This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings l…☆23Feb 22, 2022Updated 4 years ago
- ☆11Apr 22, 2018Updated 7 years ago
- An R data package containing georeferenced events of right-wing violence in Germany from 2014 onwards☆11Jun 27, 2018Updated 7 years ago
- a bunch of scripts for investigaing reddit☆11Feb 2, 2017Updated 9 years ago
- Transformer language model (GPT-2) with sentencepiece tokenizer☆10Oct 15, 2019Updated 6 years ago
- Create and analyze argument graphs and serialize them via Protobuf☆10Feb 23, 2026Updated last week
- A tokenizer and sentence splitter for German and English web and social media texts.☆153Dec 9, 2024Updated last year
- German stopwords collection☆88Oct 6, 2022Updated 3 years ago
- ARCADE198 Dataset from the ACL 2018 MRQA Workshop☆15Oct 29, 2018Updated 7 years ago
- Information extraction from English and German texts based on predicate logic☆394Jul 8, 2022Updated 3 years ago