tatuylonen / wiktextract
Wiktionary dump file parser and multilingual data extractor
☆856Updated this week
Alternatives and similar repositories for wiktextract:
Users that are interested in wiktextract are comparing it to the libraries listed below
- A Python Wiktionary Parser☆357Updated last year
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆97Updated this week
- Inflecting Finnish words (verb inflection, comparatives, cases, possessive suffixes, clitics) using Wiktionary-compatible declensions and…☆31Updated 4 years ago
- Gather modern English word frequencies from all enwiki articles.☆211Updated 11 months ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆151Updated 3 months ago
- The Open English WordNet☆507Updated this week
- Repository for Frequency Word List Generator and processed files☆1,219Updated 3 years ago
- Sentence aligner☆109Updated 3 years ago
- Machine-readable Wiktionary☆75Updated 9 months ago
- A modern, interlingual wordnet interface for Python☆232Updated 2 weeks ago
- Offline bilingual dictionaries made using data from Wiktionary☆52Updated 9 years ago
- Offline database of synonyms/thesaurus☆191Updated last year
- Crawler for linguistic corpora☆199Updated last year
- German part-of-speech dictionary☆43Updated last year
- Webster's English Dictionary in JSON format, and related Swift parsing utility☆418Updated last year
- LingPy: Python library for quantitative tasks in historical linguistics☆128Updated last year
- A Python library to parse MediaWiki WikiText☆299Updated 4 months ago
- A python module for English lemmatization and inflection.☆265Updated last year
- The World Atlas Of Language Structures Online☆126Updated last month
- An NLP library for Uralic languages such as Finnish, Skolt Sami, Moksha and so on. Also supporting some non-Uralic languages such as Span…☆73Updated 2 months ago
- Joe Speigle's Korean/English dictionary database☆110Updated last year
- An open etymology dataset created using Wiktionary data. Contains 3.8M entries, 1.8M terms, 2900 languages, and 31 unique relationship ty…☆89Updated 9 months ago
- Access a database of word frequencies, in various natural languages.☆1,434Updated last month
- Perseus Treebank Data☆71Updated 8 months ago
- Simple sentence mining tool for language learning☆420Updated 2 months ago
- Public-domain Python library for flashcard quiz scheduling using Bayesian statistics. (JavaScript, Java, Dart, and other ports available!…☆317Updated 4 months ago
- Morphological Dictionaries for German Language☆28Updated 6 years ago
- A neural word aligner based on multilingual BERT☆338Updated 2 years ago
- English Lemma Database - Compiled by Referencing British National Corpus☆29Updated 4 months ago
- A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)☆681Updated 3 weeks ago