tsproisl / SoMaJoLinks
A tokenizer and sentence splitter for German and English web and social media texts.
☆147Updated 9 months ago
Alternatives and similar repositories for SoMaJo
Users that are interested in SoMaJo are comparing it to the libraries listed below
Sorting:
- A minimal, pure Python library to interface with CoNLL-U format files.☆152Updated this week
- Text tokenization and sentence segmentation (segtok v2)☆206Updated 3 years ago
- UIMA CAS processing library written in Python☆90Updated 3 months ago
- spaCy + UDPipe☆163Updated 3 years ago
- DBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models☆155Updated 2 years ago
- Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and spacy-udpipe) that adds CoNLL-U properties to a Do…☆82Updated last year
- A Dataset of German Legal Documents for Named Entity Recognition☆173Updated 2 years ago
- A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.☆317Updated 2 months ago
- Ten Thousand German News Articles Dataset for Topic Classification☆86Updated 2 years ago
- Compound splitter for German