A tokenizer and sentence splitter for German and English web and social media texts.
☆152Dec 9, 2024Updated last year
Alternatives and similar repositories for SoMaJo
Users that are interested in SoMaJo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A part-of-speech tagger with support for domain adaptation and external resources.☆24Oct 26, 2022Updated 3 years ago
- DBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models☆158Dec 6, 2022Updated 3 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆39Dec 14, 2021Updated 4 years ago
- Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German☆526Oct 30, 2024Updated last year
- Format conversion and graphical representation of [Universal Dependencies](http://universaldependencies.org) trees.☆12Sep 3, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- OCR post correction for old German corpus☆20Aug 29, 2022Updated 3 years ago
- German GPT-2 model☆32Aug 17, 2021Updated 4 years ago
- German dataset for DPR model training☆19Jul 21, 2024Updated last year
- A Dataset of German Legal Documents for Named Entity Recognition☆178Oct 19, 2022Updated 3 years ago
- Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern stri…☆35Jul 7, 2022Updated 3 years ago
- GermaParl: Corpus of Plenary Protocols of the German Bundestag (TEI Format)☆38Jun 1, 2023Updated 3 years ago
- Python wrapper for the CWB to extract concordances and score frequency lists☆22May 11, 2026Updated last month
- Ten Thousand German News Articles Dataset for Topic Classification☆87Nov 7, 2022Updated 3 years ago
- Plan and train German transformer models.☆23Feb 22, 2021Updated 5 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Wikipedia text corpus for self-supervised NLP model training☆47Jul 17, 2022Updated 3 years ago
- Compound splitter for German☆112Apr 5, 2020Updated 6 years ago
- Python code to automatically produce a summary of a piece of text.☆11Sep 8, 2016Updated 9 years ago
- A minimal, pure Python library to interface with CoNLL-U format files.☆153May 11, 2026Updated last month
- Data for the HIPE 2022 shared task.☆23May 15, 2026Updated last month
- Combining encoder-based language models☆11Nov 11, 2021Updated 4 years ago
- Use spaCy for NLP and output to the FoLiA XML format.☆12Feb 27, 2024Updated 2 years ago
- This is a german ELMo deep contextualized word representation. It is trained on a special German Wikipedia Text Corpus.☆28Dec 15, 2019Updated 6 years ago
- A data set and model for german sentiment classification.☆70Jun 10, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- This is a german text corpus from Wikipedia. It is cleaned, preprocessed and sentence splitted. It's purpose is to train NLP embeddings l…☆23Feb 22, 2022Updated 4 years ago
- German lemmatization with IWNLP as extension for spaCy☆27Apr 13, 2026Updated 2 months ago
- ☆14Jan 25, 2026Updated 4 months ago
- Deutschsprachige Einführung in die automatisierte Inhaltsanalyse mit R.☆18Sep 11, 2020Updated 5 years ago
- GermaNER: Free Open German Named Entity Recognition Tool☆37Dec 16, 2023Updated 2 years ago
- ☆18Feb 1, 2023Updated 3 years ago
- Automating the Bechdel test and its variants for feminine representation in movies with AI☆37Nov 22, 2023Updated 2 years ago
- Named Entity Recognition data for Europeana Newspapers☆173Apr 5, 2023Updated 3 years ago
- This repository contains all manually labeled data from the GermEval-2018 shared task.☆29Sep 28, 2018Updated 7 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- R package for working with the CCS Annotator☆13Mar 14, 2024Updated 2 years ago
- suffix array construction and searching algorithms for in-memory binary data.☆12Sep 10, 2022Updated 3 years ago
- Master thesis: Exploring bias in German NLG (GPT-3 & GerPT-2). Applies regard classification and bias mitigation triggers.☆16Sep 25, 2024Updated last year
- OCRopus model for Gothic print (Fraktur)☆19Feb 16, 2020Updated 6 years ago
- A web application tagging and retrieval of arguments in text☆30May 1, 2023Updated 3 years ago
- Stemmer for German☆45Apr 29, 2022Updated 4 years ago
- Automatic Detection of Potentially Idiomatic Expressions☆12Feb 19, 2021Updated 5 years ago