openzim / python-libzimLinks
Libzim binding for Python: read/write ZIM files in Python
☆94Updated last month
Alternatives and similar repositories for python-libzim
Users that are interested in python-libzim are comparing it to the libraries listed below
Sorting:
- Translate HTML using Argos Translate☆53Updated 2 years ago
- Python API for PDF documents☆124Updated last year
- A set of utilities for processing MediaWiki XML dump data.☆57Updated 7 months ago
- Standalone version of Django's feedgenerator module☆54Updated last month
- modulegraph determines a dependency graph between Python modules primarily by bytecode analysis for import statements. modulegraph …☆46Updated 2 years ago
- Atom, RSS and JSON feed parser for Python 3☆117Updated 2 years ago
- Collection of Python code to re-use across Python-based scrapers☆24Updated last week
- An easy to use offline reader for ZIM files right in your browser!☆81Updated last year
- A modern CSS selector implementation for BeautifulSoup☆248Updated last month
- A python package for grapheme aware string handling☆115Updated 3 years ago
- SQLite3 DB-API 2.0 driver from Python 3, packaged separately, with improvements☆219Updated 5 months ago
- Python client library to interface with the MediaWiki API☆336Updated last month
- A Python library for working with and comparing language codes.☆24Updated last month
- Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python.☆221Updated last month
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆53Updated 4 years ago
- A polite and user-friendly downloader for Common Crawl data☆56Updated last month
- An experimental Python parser for MediaWiki syntax with a focus on extensibility and comprehensibility☆61Updated 3 years ago
- Training scripts for Argos Translate☆141Updated last week
- Python wrapper for the MediaWiki API to access and parse data from Wikipedia☆42Updated last month
- A framework for quick web archiving; canonical repository: https://gitea.arpa.li/JustAnotherArchivist/qwarc☆29Updated 4 years ago
- Fast PDF generation and compression. Deals with millions of pages daily.☆125Updated 3 weeks ago
- Faster, modernized fork of the language identification tool langid.py☆59Updated 10 months ago
- A low-level PDF creator☆138Updated last week
- python library to validate, clean, transform and get metadata of ISBN strings (for devs).☆269Updated last year
- URL normalization for Python☆98Updated 5 months ago
- fasttext with wheels and no external dependency, but only the predict method (<1MB)☆18Updated 10 months ago
- search interface for scholarly works☆86Updated last year
- A Python implementation of Lunr.js 🌖☆200Updated 7 months ago
- A toolchain of tasks for sequencing and fingerprinting book fulltext☆45Updated last year
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆50Updated 3 weeks ago