5j9 / wikitextparser
A Python library to parse MediaWiki WikiText
☆299Updated 3 months ago
Alternatives and similar repositories for wikitextparser:
Users that are interested in wikitextparser are comparing it to the libraries listed below
- A Python parser for MediaWiki wikicode☆780Updated last month
- Wikidata client library for Python☆345Updated 7 months ago
- A modern, interlingual wordnet interface for Python☆233Updated last week
- A Python library for working with and comparing language codes.☆342Updated 2 months ago
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆97Updated this week
- Streaming WARC/ARC library for fast web archive IO☆398Updated 2 months ago
- Cython wrapper on Hunspell Dictionary☆66Updated 7 months ago
- Python3 bindings for the Compact Language Detector v3 (CLD3)☆149Updated last year
- Python client library to interface with the MediaWiki API☆324Updated 2 weeks ago
- Morfessor is a tool for unsupervised and semi-supervised morphological segmentation☆188Updated 4 years ago
- A python true casing utility that restores case information for texts☆88Updated 2 years ago
- A Python Wiktionary Parser☆358Updated last year
- Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic fe…☆170Updated 3 years ago
- ☆167Updated 8 months ago
- Universal Dependencies online documentation☆281Updated this week
- A versioned python wrapper package for cmudict (https://github.com/cmusphinx/cmudict).☆61Updated last month
- Python tools for interacting with Wikidata☆150Updated last year
- The Global WordNet Association Collaborative Inter-Lingual Index☆41Updated 3 months ago
- Python module that identifies Chinese text as being Simplified or Traditional☆89Updated 2 months ago
- A python module for English lemmatization and inflection.☆265Updated last year
- A set of utilities for processing MediaWiki XML dump data.☆50Updated this week
- A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.☆312Updated last month
- Various utilities for processing the data.☆207Updated this week
- Compute PageRank on >3 billion Wikipedia links on off-the-shelf hardware.☆57Updated 3 months ago
- A python module for word inflections designed for use with spaCy.☆92Updated 5 years ago
- Master repo for the UniMorph project, includes the UniMorph schema and annotated data files☆27Updated 5 years ago
- Python Finite-State Toolkit☆49Updated 3 weeks ago
- Text tokenization and sentence segmentation (segtok v2)☆202Updated 2 years ago
- A character-wise tokenizer for morphologically rich languages☆27Updated last month
- A toolkit for CDX indices such as Common Crawl and the Internet Archive's Wayback Machine☆166Updated last month