tatuylonen / wiktextractLinks
Wiktionary dump file parser and multilingual data extractor
☆1,030Updated this week
Alternatives and similar repositories for wiktextract
Users that are interested in wiktextract are comparing it to the libraries listed below
Sorting:
- A Python Wiktionary Parser☆367Updated 3 months ago
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆108Updated last week
- The Open English WordNet☆643Updated 3 weeks ago
- Gather modern English word frequencies from all enwiki articles.☆226Updated last year
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆177Updated 4 months ago
- Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.☆807Updated this week
- A modern, interlingual wordnet interface for Python☆266Updated last month
- Monolingual wordlists with pronunciation information in IPA☆678Updated 5 months ago
- hand-written dictionaries from the FreeDict project☆443Updated 3 months ago
- A library for fetching and reading Tatoeba's weekly exports☆24Updated last year
- Offline bilingual dictionaries made using data from Wiktionary☆61Updated 10 years ago
- Access a database of word frequencies, in various natural languages.☆1,559Updated 9 months ago
- Repository for Frequency Word List Generator and processed files☆1,376Updated 3 years ago
- Sentence aligner☆118Updated 4 years ago
- English Lemma Database - Compiled by Referencing British National Corpus☆32Updated last year
- Machine-readable Wiktionary☆77Updated last year
- Verbe Complete Conjugator (verbecc) supports Catalan, Spanish, French, Italian, Portuguese and Romanian and can predict conjugation for u…☆98Updated this week
- A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the dat…☆159Updated 10 months ago
- A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning tech…☆75Updated last month
- Machine-readable lists of lemma-token pairs in 23 languages.☆346Updated 3 years ago
- Bitextor generates translation memories from multilingual websites☆296Updated 11 months ago
- Kanji usage frequency data collected from various sources☆150Updated last week
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆51Updated 2 years ago
- A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning tech…☆73Updated 10 months ago
- HSK 3.0 Vocabulary Lists (words and characters)☆89Updated last year
- Proxy to convert HTML responses from linguee.com to JSON format☆204Updated last year
- Find Chinese sentences based on your known vocabulary and other rules☆64Updated last year
- LingPy: Python library for quantitative tasks in historical linguistics☆137Updated 3 months ago
- ☆12Updated this week
- A cloud-based, open-source system for writing and publishing dictionaries.☆95Updated last year