snowballstem / snowball-dataLinks
Test data for snowball stemming algorithms
☆34Updated 2 months ago
Alternatives and similar repositories for snowball-data
Users that are interested in snowball-data are comparing it to the libraries listed below
Sorting:
- 📖 Library that provides ways to read from and iterate through the Wikibase entities in a Wikibase Repository JSON dump☆74Updated last year
- Official releases of the PROIEL treebank of ancient Indo-European languages☆37Updated 2 years ago
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.☆35Updated 2 years ago
- Morphological analyzer and lemmatizer for Latin.☆27Updated 6 months ago
- FreeLing project source code☆258Updated 2 years ago
- SCOWL (and friends).☆441Updated last month
- Perseus Treebank Data☆73Updated last year
- Lexical database of any language☆183Updated 3 years ago
- Website source for snowballstem.org☆17Updated last week
- A fast and accurate POS and morphological tagging toolkit (EACL 2014)☆141Updated 5 years ago
- The NLG tool for Finnish☆23Updated last year
- Linguistica 5: Unsupervised Learning of Linguistic Structure☆30Updated 6 years ago
- Tutorials for the CLTK☆52Updated 4 years ago
- A language evolution simulator, using realistic phonetic changes.☆38Updated 2 years ago
- Miscellaneous materials for teaching NLP using NLTK☆37Updated 7 years ago
- All languages stopwords collection☆452Updated last year
- Python stemming library using snowball stemmers☆263Updated 2 weeks ago
- The curation repository for the data behind Concepticon.☆39Updated last week
- Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic pr…☆69Updated 2 months ago
- Public repository for Coptic SCRIPTORIUM Corpora Releases☆35Updated last month
- CiteSeerX public repository☆133Updated last year
- CRF-based Morphological Tagging and Lemmatization☆37Updated 5 years ago
- UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files☆386Updated last month
- Automatically exported from code.google.com/p/chromium-compact-language-detector☆162Updated 4 years ago
- Fast corpus search engine originally made for the Corpus of Written Tatar language☆17Updated 5 years ago
- A multilingual parallel corpus created from translations of the Bible.☆184Updated 3 months ago
- List of common stop words in various languages.☆337Updated 2 years ago
- The CMU Link Grammar natural language parser☆400Updated 4 months ago
- English stopwords collection☆163Updated 8 years ago
- A simple collocation-driven recognition of rhymes. Contains pre-trained models for Czech, Dutch, English, French, German, Russian, and Sp…☆30Updated 2 months ago