tatuylonen / wikitextprocessorLinks
Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. For data extraction, bulk syntax checking, error detection, and offline formatting.
☆105Updated last week
Alternatives and similar repositories for wikitextprocessor
Users that are interested in wikitextprocessor are comparing it to the libraries listed below
Sorting:
- A modern, interlingual wordnet interface for Python☆255Updated 3 weeks ago
- A Python Wiktionary Parser☆362Updated 2 weeks ago
- This packages up data for the Open Multilingual Wordnet☆50Updated 2 months ago
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆168Updated 2 months ago
- A Python library to parse MediaWiki WikiText☆310Updated 2 months ago
- A list of vocabulary lists☆21Updated 5 years ago
- Machine-readable Wiktionary☆76Updated last year
- Wiktionary dump file parser and multilingual data extractor☆964Updated this week
- The Open Multilingual Wordnet☆63Updated last year
- A python module for English lemmatization and inflection.☆268Updated last year
- ☆74Updated 4 months ago
- Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)☆30Updated last month
- Tools for scraping, annotating, and parsing morphological information from Wiktionary☆15Updated 5 years ago
- A cloud-based, open-source system for writing and publishing dictionaries.☆93Updated last year
- Gather modern English word frequencies from all enwiki articles.☆220Updated last year
- Morphological Dictionaries for German Language☆29Updated 7 years ago
- The Open English WordNet☆598Updated last month
- Sentence aligner☆116Updated 4 years ago
- Wiktionary parser tool for many language editions.☆54Updated 2 years ago
- Faster, modernized fork of the language identification tool langid.py☆56Updated 8 months ago
- Master repo for the UniMorph project, includes the UniMorph schema and annotated data files☆32Updated 5 years ago
- Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.☆250Updated 2 years ago
- A list of ~100,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the dat…☆155Updated 7 months ago
- Python Finite-State Toolkit☆57Updated last week
- The Global WordNet Association Collaborative Inter-Lingual Index☆44Updated 8 months ago
- An advanced, extensible web front-end for the Manatee-open corpus search engine☆72Updated last week
- Latin BERT☆66Updated last year
- Lexical data at Unicode☆68Updated 11 months ago
- Java Wiktionary Library☆57Updated 2 years ago
- The World Atlas of Language Structures☆61Updated 9 months ago