openzim / python-libzimLinks
Libzim binding for Python: read/write ZIM files in Python
☆97Updated last month
Alternatives and similar repositories for python-libzim
Users that are interested in python-libzim are comparing it to the libraries listed below
Sorting:
- Various ZIM command line tools☆185Updated 2 months ago
- A set of utilities for processing MediaWiki XML dump data.☆61Updated 11 months ago
- Atom, RSS and JSON feed parser for Python 3☆117Updated 3 years ago
- Collection of Python code to re-use across Python-based scrapers☆25Updated this week
- An experimental Python parser for MediaWiki syntax with a focus on extensibility and comprehensibility☆60Updated 3 years ago
- Farm operated by bots to grow and harvest new zim files☆185Updated last week
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆107Updated 2 months ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆57Updated 4 years ago
- A Python library to parse MediaWiki WikiText☆315Updated 8 months ago
- Loadable spellfix1 extension for sqlite as python package☆27Updated last year
- Python wrapper for the MediaWiki API to access and parse data from Wikipedia☆42Updated 2 weeks ago
- Fast PDF generation and compression. Deals with millions of pages daily.☆133Updated 3 weeks ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆55Updated 2 months ago
- search interface for scholarly works☆85Updated last year
- A Python implementation of Lunr.js 🌖☆203Updated 10 months ago
- A python package for grapheme aware string handling☆115Updated 3 years ago
- Python client library to interface with the MediaWiki API☆340Updated last month
- ISO 639 library for Python☆35Updated last year
- Streaming WARC/ARC library for fast web archive IO☆442Updated last year
- fasttext with wheels and no external dependency, but only the predict method (<1MB)☆19Updated last year
- URL normalization for Python☆99Updated 9 months ago
- A toolchain of tasks for sequencing and fingerprinting book fulltext☆46Updated last year
- A Python API to the Internet Archive Wayback Machine☆84Updated last week
- Kiwix & openZIM build engine☆116Updated this week
- Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python.☆226Updated last month
- modulegraph determines a dependency graph between Python modules primarily by bytecode analysis for import statements. modulegraph …☆46Updated 3 weeks ago
- A modern CSS selector implementation for BeautifulSoup☆263Updated last week
- Python bindings for Wasm3, a fast WebAssembly interpreter and the most universal WASM runtime☆89Updated last year
- A simple Python wrapper and command-line interface for archive.org’s "Save Page Now" capturing service☆188Updated last week
- image-to-text model for PDF.js☆50Updated 10 months ago