snowballstem / snowball-dataLinks
Test data for snowball stemming algorithms
β33Updated last week
Alternatives and similar repositories for snowball-data
Users that are interested in snowball-data are comparing it to the libraries listed below
Sorting:
- ElixirFM Functional Arabic Morphologyβ43Updated 2 years ago
- π Library that provides ways to read from and iterate through the Wikibase entities in a Wikibase Repository JSON dumpβ74Updated 11 months ago
- Software and resources for natural language processing.β131Updated 8 years ago
- Mishtar: Named and temporal entities chunkerβ13Updated 4 years ago
- AQMAR Arabic Tagger: Sequence tagger with cost-augmented structured perceptron trainingβ42Updated 11 years ago
- SymSpellCompound: compound aware automatic spelling correctionβ66Updated 7 years ago
- Bilingual sentence aligner (Gale & Church, 1993)β14Updated 6 years ago
- A Javascript Implementation of the Porter Stemmerβ96Updated 3 years ago
- Model Training tool for MITIEβ79Updated 9 years ago
- Comparable documents miner: Arabic-English morphological analysis, text processing, n-gram features extraction, POS tagging, dictionary tβ¦β34Updated 8 years ago
- Full Stack of Latvian Language Resources for Natural Language Understanding (NLU) and Generation (NLG)β15Updated 2 years ago
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (incluβ¦β64Updated last year
- Search back-end for dependency tree search. See the docs at https://fginter.github.io/dep_search/β17Updated 7 years ago
- Snowball compiler and stemming algorithmsβ796Updated last week
- Website source for snowballstem.orgβ17Updated last week
- Official releases of the PROIEL treebank of ancient Indo-European languagesβ37Updated 2 years ago
- Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic prβ¦β68Updated this week
- Fast corpus search engine originally made for the Corpus of Written Tatar languageβ17Updated 5 years ago
- Treex NLP frameworkβ32Updated this week
- A NoSketch Engine Docker image which is easy to useβ19Updated 2 weeks ago
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.β34Updated 2 years ago
- Morphological analyzer and lemmatizer for Latin.β27Updated 4 months ago
- β35Updated 6 years ago
- Generate arabic golden standard corpus for morphology and stemmingβ12Updated 2 years ago
- Wiktionary parser tool for many language editions.β54Updated 2 years ago
- command-line tool to extract taxonomies from Wikidataβ126Updated 6 years ago
- Automatically exported from code.google.com/p/guess-languageβ53Updated last year
- Basic dataset for the linguistic data collection.β15Updated 8 years ago
- A command line version of Koja Stemmer (An Arabic rooting algorithm)β20Updated 8 years ago
- Hunspell analysis for ElasticSearchβ38Updated 13 years ago