kuhumcst / cstlemma
Lemmatiser for Danish, Dutch, English, German, Polish, Romanian, Russian and tens of other languages, that uses affix rules (affix: prefix, infix, suffix, circumfix). Rules are obtained by supervised learning from a full form - lemma list.
☆35Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for cstlemma
- Lexicons for the Multilingual UCREL Semantic Analysis System☆39Updated last year
- Multi Tier Annotation Search☆26Updated 3 years ago
- linguistics backend☆40Updated last year
- Wrapper for DKPro Core to extract lingustic information from books.☆16Updated 2 years ago
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆110Updated 4 months ago
- Language Tool style grammar handling with spaCy 2.0☆42Updated 6 years ago
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆61Updated 6 months ago
- Repository for code and metadata to support work described in "Authorless Topic Models: Biasing Models Away from Known Structure"☆28Updated 4 years ago
- KenLM extension for spaCy 2.0.☆16Updated 6 years ago
- Regex like pattern tree matching but on sentence's tree instead of Strings☆42Updated 6 years ago
- Search back-end for dependency tree search. See the docs at https://fginter.github.io/dep_search/☆17Updated 6 years ago
- Entity linker for the newspaper collection of the National Library of the Netherlands. Links named entity mentions to DBpedia description…☆11Updated last year
- Python library providing sentiment lexicons.☆26Updated 7 years ago
- Tutorial on NE processing for Digital Humanities - DH Utrech 2019☆25Updated 5 years ago
- The Potsdam Twitter Sentiment Corpus☆17Updated 4 years ago
- spaCy + UDPipe☆161Updated 2 years ago
- A Python module for interfacing with the Treetagger by Helmut Schmid.☆77Updated 3 years ago
- Sentiment Lexicon Generation Suite☆15Updated 6 years ago
- C++ implementation of Generalised Brown clustering and python scripts for feature generation☆41Updated 8 years ago
- Compare accuracies of udpipe models and spacy models which can be used for NLP annotation☆14Updated 6 years ago
- Hunspell extension for spaCy 2.0.☆94Updated 3 months ago
- Featurize words into orthographic and phonological vectors.☆40Updated last year
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆50Updated 4 years ago
- Software for multi-level annotation of linguistic corpora☆17Updated 4 years ago
- Unsupervised method for extracting quotation-speaker pairs from large news corpora.☆28Updated 6 years ago
- German lemmatization with IWNLP as extension for spaCy☆24Updated last year
- Python tools for text☆15Updated 4 years ago
- Program used to split text into segments☆25Updated 3 weeks ago
- SMOR (Stuttgart Morphology) with alternative lemmatization component☆12Updated last year
- Python Multilingual Ucrel Semantic Analysis System☆30Updated 3 months ago