openzim / python-libzimLinks
Libzim binding for Python: read/write ZIM files in Python
☆97Updated last month
Alternatives and similar repositories for python-libzim
Users that are interested in python-libzim are comparing it to the libraries listed below
Sorting:
- Python wrapper for the MediaWiki API to access and parse data from Wikipedia☆42Updated 3 weeks ago
- Collection of Python code to re-use across Python-based scrapers☆25Updated last week
- A Python implementation of Lunr.js 🌖☆204Updated 10 months ago
- A set of utilities for processing MediaWiki XML dump data.☆61Updated 11 months ago
- MediaWiki scraper: all your wiki articles in one highly compressed ZIM file☆420Updated this week
- A toolchain of tasks for sequencing and fingerprinting book fulltext☆46Updated last year
- Python bindings for Wasm3, a fast WebAssembly interpreter and the most universal WASM runtime☆89Updated last year
- Atom, RSS and JSON feed parser for Python 3☆118Updated 3 years ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆55Updated this week
- Standalone version of Django's feedgenerator module☆55Updated 5 months ago
- A Python library to parse MediaWiki WikiText☆316Updated 8 months ago
- Python API for PDF documents☆124Updated last year
- An easy to use offline reader for ZIM files right in your browser!☆84Updated 2 years ago
- Python client library to interface with the MediaWiki API☆340Updated last month
- A polite and user-friendly downloader for Common Crawl data☆67Updated 5 months ago
- A Python API to the Internet Archive Wayback Machine☆84Updated this week
- URL normalization for Python☆99Updated 9 months ago
- A Python library for working with and comparing language codes.☆28Updated last month
- A Python binding of SQLite Full Text Search Tokenizer☆50Updated 2 months ago
- modulegraph determines a dependency graph between Python modules primarily by bytecode analysis for import statements. modulegraph …☆46Updated last month
- A framework for quick web archiving; canonical repository: https://gitea.arpa.li/JustAnotherArchivist/qwarc☆30Updated 2 weeks ago
- A modern CSS selector implementation for BeautifulSoup☆263Updated this week
- SQLite3 DB-API 2.0 driver from Python 3, packaged separately, with improvements☆231Updated 3 weeks ago
- A low-level PDF creator☆140Updated 2 months ago
- Python library for reading and writing warc files☆247Updated 3 years ago
- An experimental Python parser for MediaWiki syntax with a focus on extensibility and comprehensibility☆60Updated 3 years ago
- Collection of core plugins for markdown-it-py☆37Updated last week
- Sort-friendly URI Reordering Transform (SURT) python module☆44Updated 4 months ago
- Loadable spellfix1 extension for sqlite as python package☆27Updated last year
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆58Updated 4 years ago