kuhumcst / cstlemma
Lemmatiser for Danish, Dutch, English, German, Polish, Romanian, Russian and tens of other languages, that uses affix rules (affix: prefix, infix, suffix, circumfix). Rules are obtained by supervised learning from a full form - lemma list.
☆36Updated last month
Alternatives and similar repositories for cstlemma:
Users that are interested in cstlemma are comparing it to the libraries listed below
- A Python module for interfacing with the Treetagger by Helmut Schmid.☆75Updated 3 years ago
- German lemmatization with IWNLP as extension for spaCy☆24Updated last year
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆63Updated 10 months ago
- German Morphological Analyzer☆47Updated 3 years ago
- Sentiment Lexicon Generation Suite☆15Updated 7 years ago
- Lexicons for the Multilingual UCREL Semantic Analysis System☆41Updated last year
- Wrapper for DKPro Core to extract lingustic information from books.☆16Updated 3 years ago
- Hunspell extension for spaCy 2.0.☆94Updated 7 months ago
- Entity linker for the newspaper collection of the National Library of the Netherlands. Links named entity mentions to DBpedia description…☆11Updated 2 years ago
- FoLiA Linguistic Annotation Tool -- Flat is a web-based linguistic annotation environment based around the FoLiA format (http://proycon.g…☆112Updated 2 months ago
- Multi Tier Annotation Search☆26Updated 3 years ago
- BERT and ELECTRA models trained on Europeana Newspapers☆37Updated 3 years ago
- PDF Table Extractor - repository to hold revisable version of code from https://www.cvast.tuwien.ac.at/projects/pdf2table by Burcu Yildiz☆38Updated last year
- Language detection extension for spaCy 2.0+☆112Updated 6 years ago
- Program used to split text into segments☆25Updated 4 months ago
- KenLM extension for spaCy 2.0.☆16Updated 7 years ago
- Python port for IWNLP.Lemmatizer☆17Updated last year
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Updated 4 years ago
- A Java UIMA-based toolbox for multilingual and efficient terminology extraction an multilingual term alignment☆38Updated 7 years ago
- Use spaCy for NLP and output to the FoLiA XML format.☆12Updated last year
- Search back-end for dependency tree search. See the docs at https://fginter.github.io/dep_search/☆17Updated 6 years ago
- 📂 Additional lookup tables and data resources for spaCy☆105Updated last month
- Machine translation for the real world☆23Updated 5 years ago
- Compare accuracies of udpipe models and spacy models which can be used for NLP annotation☆14Updated 7 years ago
- C++ implementation of Generalised Brown clustering and python scripts for feature generation☆41Updated 8 years ago
- ADEL is a robust and efficient entity linking framework that is adaptive to text genres and language, entity types for the classification…☆19Updated 5 years ago
- Open-source tools for morphological tagging, segmentation and stemming.☆41Updated 5 years ago
- Distributed infrastructure for Machine Translation web services (using Moses, Python, JSON-RPC/web interface)☆33Updated 3 years ago
- linguistics backend☆41Updated 2 years ago
- ☆10Updated 6 years ago