openzim / python-libzimLinks
Libzim binding for Python: read/write ZIM files in Python
☆92Updated 4 months ago
Alternatives and similar repositories for python-libzim
Users that are interested in python-libzim are comparing it to the libraries listed below
Sorting:
- Various ZIM command line tools☆171Updated 2 months ago
- Farm operated by bots to grow and harvest new zim files☆111Updated last week
- A set of utilities for processing MediaWiki XML dump data.☆57Updated 6 months ago
- Command line tool to convert a file in the WARC format to a file in the ZIM format☆69Updated 5 months ago
- Put together a multilingual corpus from a variety of sources. Used for wordfreq and word embeddings.☆53Updated 4 years ago
- Atom, RSS and JSON feed parser for Python 3☆117Updated 2 years ago
- A framework for quick web archiving; canonical repository: https://gitea.arpa.li/JustAnotherArchivist/qwarc☆29Updated 4 years ago
- An experimental Python parser for MediaWiki syntax with a focus on extensibility and comprehensibility☆61Updated 2 years ago
- Collection of Python code to re-use across Python-based scrapers☆25Updated 3 months ago
- An easy to use offline reader for ZIM files right in your browser!☆79Updated last year
- A polite and user-friendly downloader for Common Crawl data☆53Updated 2 weeks ago
- Standalone version of Django's feedgenerator module☆54Updated 2 weeks ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆48Updated this week
- Streaming WARC/ARC library for fast web archive IO☆428Updated 8 months ago
- Python library for reading and writing warc files☆244Updated 3 years ago
- ISO 639 library for Python☆34Updated 11 months ago
- search interface for scholarly works☆86Updated last year
- A simple Python wrapper and command-line interface for archive.org’s "Save Page Now" capturing service☆182Updated 10 months ago
- A Python library to parse MediaWiki WikiText☆312Updated 3 months ago
- A modern CSS selector implementation for BeautifulSoup☆245Updated last month
- Kiwix & openZIM build engine☆103Updated 2 months ago
- Python wrapper for the MediaWiki API to access and parse data from Wikipedia☆41Updated 2 weeks ago
- Faster, modernized fork of the language identification tool langid.py☆56Updated 9 months ago
- python library to validate, clean, transform and get metadata of ISBN strings (for devs).☆269Updated last year
- Fast and robust date extraction from web pages, with Python or on the command-line☆138Updated last month
- Python client library to interface with the MediaWiki API☆333Updated last week
- Loadable spellfix1 extension for sqlite as python package☆26Updated last year
- URL normalization for Python☆97Updated 4 months ago
- fasttext with wheels and no external dependency, but only the predict method (<1MB)☆17Updated 9 months ago
- A PDF classifier ensemble with REST API service☆23Updated 4 years ago