nytud / emtsvLinks
e-magyar text processing system -- inter-module communication via tsv + REST API
☆31Updated 5 months ago
Alternatives and similar repositories for emtsv
Users that are interested in emtsv are comparing it to the libraries listed below
Sorting:
- Universal Dependencies online documentation☆288Updated this week
- An advanced, extensible web front-end for the Manatee-open corpus search engine☆78Updated this week
- The home repository of the NerKor corpus, a Hungarian gold standard named entity annotated corpus containing 1 million tokens.☆16Updated 2 years ago
- All languages stopwords collection☆476Updated 2 years ago
- Various utilities for processing the data.☆217Updated last week
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆185Updated 8 months ago
- A modern, interlingual wordnet interface for Python☆282Updated last week
- UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files☆393Updated 2 weeks ago
- Compound splitter for German☆112Updated 5 years ago
- Open German WordNet☆100Updated last month
- ✔️Contextual word checker for better suggestions (not actively maintained)☆418Updated last year
- German Morphological Analyzer☆51Updated 4 years ago
- JSON-NLP Schema for transfer of NLP output using JSON☆54Updated 5 years ago
- A curated list of NLP resources for Hungarian☆268Updated 3 weeks ago
- Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern stri…☆34Updated 3 years ago
- Faster, modernized fork of the language identification tool langid.py☆60Updated last year
- A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the dat…☆164Updated last year
- A character-wise tokenizer for morphologically rich languages☆31Updated 4 months ago
- Text tokenization and sentence segmentation (segtok v2)☆208Updated 3 years ago
- A neural parsing pipeline for segmentation, morphological tagging, dependency parsing and lemmatization with pre-trained models for more …☆115Updated last year
- A tokenizer and sentence splitter for German and English web and social media texts.☆151Updated last year
- spacy-wordnet creates annotations that easily allow the use of wordnet and wordnet domains by using the nltk wordnet interface☆261Updated 5 months ago
- A multilingual parallel corpus created from translations of the Bible.☆191Updated 8 months ago
- A Python library to parse MediaWiki WikiText☆317Updated 8 months ago
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆255Updated 3 years ago
- 🆕 Work continues on INCEpTION 👉 https://github.com/inception-project/inception 👈 -- ⚠️ The official WebAnno repository has reached the…☆250Updated 2 years ago
- A Directory of Online Newspaper Sources for 70+ Languages☆31Updated 4 years ago
- This packages up data for the Open Multilingual Wordnet☆60Updated last week
- UIMA CAS processing library written in Python☆91Updated this week
- Parse and convert numbers written in French, English, Spanish, Portuguese, German and Catalan into their digit representation.☆113Updated last month