openzim / python-libzim
Libzim binding for Python: read/write ZIM files in Python
☆76Updated this week
Alternatives and similar repositories for python-libzim:
Users that are interested in python-libzim are comparing it to the libraries listed below
- Collection of Python code to re-use across Python-based scrapers☆22Updated this week
- Farm operated by bots to grow and harvest new zim files☆96Updated this week
- Command line tool to convert a file in the WARC format to a file in the ZIM format☆52Updated this week
- Loadable spellfix1 extension for sqlite as python package☆26Updated 10 months ago
- A set of utilities for processing MediaWiki XML dump data.☆50Updated last week
- Python wrapper for the MediaWiki API to access and parse data from Wikipedia☆39Updated 8 months ago
- A framework for quick web archiving; canonical repository: https://gitea.arpa.li/JustAnotherArchivist/qwarc☆27Updated 3 years ago
- Standalone version of Django's feedgenerator module☆52Updated 10 months ago
- Training scripts for Argos Translate☆128Updated 3 months ago
- Translate HTML using Argos Translate☆50Updated last year
- A robust web archive analytics toolkit☆99Updated 2 months ago
- fasttext with wheels and no external dependency, but only the predict method (<1MB)☆13Updated 2 months ago
- Fast and robust date extraction from web pages, with Python or on the command-line☆122Updated last month
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆39Updated this week
- A Python implementation of Lunr.js 🌖☆195Updated last month
- A simple HTML content extractor in Python. Can be run as a wrapper for Mozilla's Readability.js package or in pure-python mode.☆260Updated 2 months ago
- Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.☆52Updated last month
- A sentence segmentation library with wide language support optimized for speed and utility.☆58Updated 5 months ago
- wabac.js - Web Archive Browsing Augmentation Client☆106Updated last week
- Python Simple Object Storage - provides a list and dictionary interface that seamlessly stores data in a file, like a simplified database…☆57Updated 2 years ago
- A toolchain of tasks for sequencing and fingerprinting book fulltext☆43Updated 6 months ago
- Faster, modernized fork of the language identification tool langid.py☆53Updated 3 months ago
- Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. Fo…☆97Updated this week
- Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python.☆161Updated last month
- Turns a collection of documents into a browsable ZIM file☆24Updated this week
- search interface for scholarly works☆83Updated 6 months ago
- ISO 639 library for Python☆32Updated 5 months ago
- Generation of bilingual dictionaries from Wiktionary/dbnary data for the WikDict project☆46Updated 3 months ago
- Python binding to Poppler-cpp pdf library☆105Updated 5 months ago