tatuylonen / wikitextprocessor
Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. For data extraction, bulk syntax checking, error detection, and offline formatting.
☆97Updated last week
Alternatives and similar repositories for wikitextprocessor:
Users that are interested in wikitextprocessor are comparing it to the libraries listed below
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆154Updated 4 months ago
- A Python Wiktionary Parser☆357Updated last month
- Wiktionary parser tool for many language editions.☆54Updated 2 years ago
- Machine-readable Wiktionary☆76Updated 11 months ago
- The Global WordNet Association Collaborative Inter-Lingual Index☆42Updated 5 months ago
- The Open Multilingual Wordnet☆61Updated 11 months ago
- WordNet-LMF formats☆21Updated last month
- A modern, interlingual wordnet interface for Python☆238Updated this week
- Generation of bilingual dictionaries from Wiktionary/dbnary data for the WikDict project☆49Updated 5 months ago
- A cloud-based, open-source system for writing and publishing dictionaries.☆89Updated last year
- An advanced, extensible web front-end for the Manatee-open corpus search engine☆64Updated this week
- This packages up data for the Open Multilingual Wordnet☆47Updated last month
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆51Updated 3 years ago
- A Python library to parse MediaWiki WikiText☆305Updated 5 months ago
- English Resource Grammar☆20Updated 8 months ago
- Wiktionary dump file parser and multilingual data extractor☆881Updated this week
- ☆73Updated 2 weeks ago
- Lexical data at Unicode☆68Updated 7 months ago
- Master repo for the UniMorph project, includes the UniMorph schema and annotated data files☆27Updated 5 years ago
- A versioned python wrapper package for cmudict (https://github.com/cmusphinx/cmudict).☆62Updated last month
- A list of vocabulary lists☆21Updated 4 years ago
- Faster, modernized fork of the language identification tool langid.py☆55Updated 4 months ago
- Sentence aligner☆112Updated 3 years ago
- Pipeline to generate the Standardized Project Gutenberg Corpus☆177Updated last year
- These are lists for a variety of languages containing words that are distinctive to each language.☆37Updated 3 years ago
- Python Finite-State Toolkit☆54Updated last month
- University of Colorado VerbNet☆104Updated 10 months ago
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆43Updated 2 years ago
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆63Updated 10 months ago
- Helsinki Finite-State Technology (library and application suite)☆129Updated this week