tatuylonen / wikitextprocessor
Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. For data extraction, bulk syntax checking, error detection, and offline formatting.
☆97Updated this week
Alternatives and similar repositories for wikitextprocessor:
Users that are interested in wikitextprocessor are comparing it to the libraries listed below
- Wiktionary parser tool for many language editions.☆54Updated 2 years ago
- A Python Wiktionary Parser☆357Updated last year
- Wiktionary dump file parser and multilingual data extractor☆856Updated this week
- A modern, interlingual wordnet interface for Python☆232Updated 2 weeks ago
- Machine-readable Wiktionary☆75Updated 9 months ago
- Generation of bilingual dictionaries from Wiktionary/dbnary data for the WikDict project☆46Updated 3 months ago
- A Python library to parse MediaWiki WikiText☆299Updated 4 months ago
- Sentence aligner☆109Updated 3 years ago
- A list of vocabulary lists☆21Updated 4 years ago
- A versioned python wrapper package for cmudict (https://github.com/cmusphinx/cmudict).☆61Updated last month
- Simple multilingual lemmatizer for Python, especially useful for speed and efficiency☆151Updated 3 months ago
- ☆71Updated 2 weeks ago
- This packages up data for the Open Multilingual Wordnet☆45Updated last week
- Multilingual sentence alignment using sentence embeddings☆108Updated 3 months ago
- Efficient teacher-student models and scripts to make them☆49Updated last year
- An advanced, extensible web front-end for the Manatee-open corpus search engine☆64Updated this week
- Master repo for the UniMorph project, includes the UniMorph schema and annotated data files☆26Updated 5 years ago
- The Global WordNet Association Collaborative Inter-Lingual Index☆41Updated 3 months ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆51Updated 3 years ago
- Offline bilingual dictionaries made using data from Wiktionary☆52Updated 9 years ago
- The source of the phonetic transcriptions is Oxford Advanced Learner's Dictionary (3rd ed.), available from the Oxford Text Archive (http…☆23Updated 7 years ago
- Tools for scraping, annotating, and parsing morphological information from Wiktionary☆13Updated 5 years ago
- Python API to access glottolog/glottolog☆29Updated 3 months ago
- An easy-to-use library to linguistically compare one sentence and its words to another, in the same language or a different one. For inst…☆22Updated 3 years ago
- Collaborative data curation for Glottolog☆156Updated this week
- A sentence segmentation library with wide language support optimized for speed and utility.☆57Updated 5 months ago
- Transform TMX to text☆28Updated 2 years ago
- Real world example to demonstrate advanced techniques to unmarshall very large xml document with very low memory footprint.☆59Updated last year
- A python module for word inflections designed for use with spaCy.☆92Updated 5 years ago
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆70Updated 9 months ago