tatuylonen / wiktextractLinks
Wiktionary dump file parser and multilingual data extractor
☆927Updated this week
Alternatives and similar repositories for wiktextract
Users that are interested in wiktextract are comparing it to the libraries listed below
Sorting:
- A Python Wiktionary Parser☆360Updated 3 months ago
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆101Updated 2 weeks ago
- The Open English WordNet☆558Updated last week
- Ebook reader dictionaries extracted from Wiktionary in almost all languages, in Stardict, Tabfile and Kindle format☆97Updated 2 years ago
- Offline bilingual dictionaries made using data from Wiktionary☆55Updated 10 years ago
- Gather modern English word frequencies from all enwiki articles.☆213Updated last year
- Inflecting Finnish words (verb inflection, comparatives, cases, possessive suffixes, clitics) using Wiktionary-compatible declensions and…☆32Updated 4 years ago
- Spanish to English dictionary, frequency list, and lemma data☆33Updated 2 weeks ago
- A modern, interlingual wordnet interface for Python☆247Updated this week
- Monolingual wordlists with pronunciation information in IPA☆623Updated last week
- hand-written dictionaries from the FreeDict project☆420Updated 7 months ago
- An efficient Python package for detecting and identifying English idiomatic expressions and phrases within sentences.☆21Updated last year
- A practical python library for identifying morphemes.☆13Updated 2 years ago
- Machine-readable Wiktionary☆76Updated last year
- Access a database of word frequencies, in various natural languages.☆1,475Updated 5 months ago
- A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning tech…☆73Updated 8 months ago
- Sentence aligner☆113Updated 4 years ago
- enchant spellchecking library☆365Updated this week
- Converts English text to IPA notation☆383Updated 2 years ago
- A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning tech…☆72Updated 5 months ago
- Most common sentences and words for all languages in the OpenSubtitles2018 corpus with Python code☆35Updated 3 months ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆160Updated 3 weeks ago
- A Python parser for MediaWiki wikicode☆798Updated 2 months ago
- Find Chinese sentences based on your known vocabulary and other rules☆62Updated last year
- Generation of bilingual dictionaries from Wiktionary/dbnary data for the WikDict project☆49Updated 7 months ago
- Machine-readable lists of lemma-token pairs in 23 languages.☆340Updated 3 years ago
- Master repo for the UniMorph project, includes the UniMorph schema and annotated data files☆30Updated 5 years ago
- A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)☆725Updated last month
- Joe Speigle's Korean/English dictionary database☆113Updated last year
- An open etymology dataset created using Wiktionary data. Contains 3.8M entries, 1.8M terms, 2900 languages, and 31 unique relationship ty…☆103Updated last year