tatuylonen / wiktextractLinks
Wiktionary dump file parser and multilingual data extractor
☆1,015Updated this week
Alternatives and similar repositories for wiktextract
Users that are interested in wiktextract are comparing it to the libraries listed below
Sorting:
- A Python Wiktionary Parser☆364Updated 2 months ago
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆107Updated 2 weeks ago
- The Open English WordNet☆634Updated last week
- A modern, interlingual wordnet interface for Python☆260Updated last month
- Monolingual wordlists with pronunciation information in IPA☆673Updated 4 months ago
- English Lemma Database - Compiled by Referencing British National Corpus☆32Updated last year
- A Python library to parse MediaWiki WikiText☆313Updated 4 months ago
- Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.☆798Updated last week
- Access a database of word frequencies, in various natural languages.☆1,546Updated 9 months ago
- Gather modern English word frequencies from all enwiki articles.☆224Updated last year
- A Python parser for MediaWiki wikicode☆835Updated 3 months ago
- Machine-readable Wiktionary☆77Updated last year
- hand-written dictionaries from the FreeDict project☆437Updated 2 months ago
- A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the dat…☆159Updated 9 months ago
- Machine-readable lists of lemma-token pairs in 23 languages.☆343Updated 3 years ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆175Updated 4 months ago
- Offline bilingual dictionaries made using data from Wiktionary☆58Updated 10 years ago
- A library for fetching and reading Tatoeba's weekly exports☆24Updated last year
- HSK 3.0 Vocabulary Lists (words and characters)☆87Updated last year
- The Open Source Dictionary☆574Updated 6 months ago
- Sentence aligner☆117Updated 4 years ago
- Open German WordNet☆97Updated last week
- Open Language Profiles — English profile datasets from CEFR-J☆149Updated 5 years ago
- Word/n-gram frequency lists for the Google Books Ngram Corpus (v3, all languages) with Python code☆93Updated 2 years ago
- A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning tech…☆75Updated last month
- Bitextor generates translation memories from multilingual websites☆295Updated 10 months ago
- Spanish to English dictionary, frequency list, and lemma data☆36Updated this week
- VerbeCompleteConjugator supports Catalan, Spanish, French, Italian, Portuguese and Romanian and can predict conjugation for unknown verbs…☆96Updated this week
- LingPy: Python library for quantitative tasks in historical linguistics☆137Updated 2 months ago
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆48Updated 2 years ago