tatuylonen / wiktextractLinks
Wiktionary dump file parser and multilingual data extractor
☆940Updated last week
Alternatives and similar repositories for wiktextract
Users that are interested in wiktextract are comparing it to the libraries listed below
Sorting:
- A Python Wiktionary Parser☆361Updated 4 months ago
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆102Updated last month
- Gather modern English word frequencies from all enwiki articles.☆216Updated last year
- hand-written dictionaries from the FreeDict project☆420Updated 8 months ago
- A modern, interlingual wordnet interface for Python☆251Updated this week
- Machine-readable Wiktionary☆76Updated last year
- Word/n-gram frequency lists for the Google Books Ngram Corpus (v3, all languages) with Python code☆76Updated last year
- Access a database of word frequencies, in various natural languages.☆1,491Updated 5 months ago
- The Open English WordNet☆576Updated this week
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆164Updated 2 weeks ago
- Universal Dependencies online documentation☆285Updated this week
- Monolingual wordlists with pronunciation information in IPA☆632Updated last month
- Inflecting Finnish words (verb inflection, comparatives, cases, possessive suffixes, clitics) using Wiktionary-compatible declensions and…☆32Updated 4 years ago
- A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning tech…☆72Updated 6 months ago
- A python module for English lemmatization and inflection.☆268Updated last year
- Collaborative data curation for Glottolog☆165Updated last week
- Bitextor generates translation memories from multilingual websites☆293Updated 7 months ago
- Spanish to English dictionary, frequency list, and lemma data☆33Updated this week
- The Global WordNet Association Collaborative Inter-Lingual Index☆43Updated 7 months ago
- Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.☆770Updated this week
- English Lemma Database - Compiled by Referencing British National Corpus☆31Updated 9 months ago
- Machine-readable lists of lemma-token pairs in 23 languages.☆340Updated 3 years ago
- Helsinki Finite-State Technology (library and application suite)☆131Updated last month
- UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files☆380Updated 7 months ago
- A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)☆728Updated 2 months ago
- WordNet in JSON format.☆91Updated 4 years ago
- Converts English text to IPA notation☆388Updated 2 years ago
- Automatically exported from code.google.com/p/foma☆122Updated 4 months ago
- German part-of-speech dictionary☆45Updated last year
- Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German☆487Updated 7 months ago