tatuylonen / wikitextprocessor
Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. For data extraction, bulk syntax checking, error detection, and offline formatting.
☆99Updated this week
Alternatives and similar repositories for wikitextprocessor:
Users that are interested in wikitextprocessor are comparing it to the libraries listed below
- A list of vocabulary lists☆21Updated 4 years ago
- Wiktionary parser tool for many language editions.☆54Updated 2 years ago
- A modern, interlingual wordnet interface for Python☆244Updated this week
- Machine-readable Wiktionary☆76Updated last year
- Sentence aligner☆112Updated 3 years ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆154Updated 5 months ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆51Updated 3 years ago
- A library for fetching and reading Tatoeba's weekly exports☆22Updated last year
- A versioned python wrapper package for cmudict (https://github.com/cmusphinx/cmudict).☆63Updated 3 weeks ago
- A Python Wiktionary Parser☆358Updated 2 months ago
- Generation of bilingual dictionaries from Wiktionary/dbnary data for the WikDict project☆49Updated 6 months ago
- This packages up data for the Open Multilingual Wordnet☆48Updated last week
- Python Finite-State Toolkit☆54Updated 2 months ago
- A cloud-based, open-source system for writing and publishing dictionaries.☆89Updated last year
- The Global WordNet Association Collaborative Inter-Lingual Index☆42Updated 5 months ago
- ☆72Updated last month
- Master repo for the UniMorph project, includes the UniMorph schema and annotated data files☆28Updated 5 years ago
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆44Updated 2 years ago
- An NLP library for Uralic languages such as Finnish, Skolt Sami, Moksha and so on. Also supporting some non-Uralic languages such as Span…☆80Updated 5 months ago
- Wiktionary dump file parser and multilingual data extractor☆900Updated this week
- German Morphological Analyzer☆47Updated 3 years ago
- Morphological Dictionaries for German Language☆29Updated 7 years ago
- CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates☆48Updated last year
- A list of resources for conservation, development, and documentation of endangered, minority, and low or under-resourced human languages.☆34Updated 2 years ago
- universal tokenizer☆17Updated 3 years ago
- An advanced, extensible web front-end for the Manatee-open corpus search engine☆64Updated last week
- German part-of-speech dictionary☆45Updated last year
- A Python library to parse MediaWiki WikiText☆307Updated 6 months ago
- A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning tech…☆72Updated 4 months ago
- Java Wiktionary Library☆57Updated 2 years ago