openzim / python-scraperlib
Collection of Python code to re-use across Python-based scrapers
☆22Updated last week
Alternatives and similar repositories for python-scraperlib:
Users that are interested in python-scraperlib are comparing it to the libraries listed below
- Turns a collection of documents into a browsable ZIM file☆24Updated 2 months ago
- Command line tool to convert a file in the WARC format to a file in the ZIM format☆57Updated last month
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆54Updated 8 months ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆40Updated this week
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Arch…☆19Updated last year
- Clean a series of links, resolving redirects and finding Wayback results if page is gone. Originally written to aid with importing from A…☆18Updated 7 months ago
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆15Updated 4 years ago
- Create a ZIM file from a Youtube channel/username/playlist☆66Updated last week
- Internet-in-a-Box (IIAB) Maps are like Google Maps but better, for schools especially, as they work offline (including satellite photos!)…☆25Updated 2 years ago
- Decentralized web archiving☆20Updated 6 years ago
- Official ArchiveBox MITM proxy: saves URLs of all requests passing through to an ArchiveBox server for archival.☆21Updated 9 months ago
- Run your own X, in a few clicks.☆12Updated 6 months ago
- A powerful tool that converts voice recordings into high-quality Anki flashcards using AI-powered transcription and LLM processing, featu…☆20Updated 3 months ago
- Libzim binding for Python: read/write ZIM files in Python☆85Updated 2 weeks ago
- ☆89Updated 2 weeks ago
- Homebrew formula for the ArchiveBox self-hosted internet archiving solution.☆28Updated 7 months ago
- Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page…☆40Updated 7 months ago
- Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing…☆77Updated last month
- [Moved to https://github.com/standardnotes/app] A code editor for Standard Notes with syntax highlighting support for over 120 programmin…☆13Updated 3 years ago
- 📦 Modern strongly typed Python library for managing system dependencies with package managers like apt, brew, pip, npm, etc.☆17Updated 3 weeks ago
- This is the HeadQuarters of my digital info. HPI library got me inspired and I'm trying to play with the idea on a smaller scale for myse…☆21Updated last year
- iFixit to ZIM scraper☆30Updated last month
- Tools to count the number of public domain and free to distribute movies registered in IMDB☆24Updated 5 years ago
- Generic automation tool around data stored as plaintext YAML files☆34Updated last month
- Chrome extension to add relative bookmarks to your browser☆27Updated 3 years ago
- The "hyp.is" service that takes a user to a URL with Hypothesis activated☆50Updated this week
- ☆7Updated 2 years ago
- A simple framework for new and experienced Python programmers to create animations, games, and other graphics-based programs. Includes GU…☆14Updated 7 months ago
- Awesome links related to RSS, ATOM, and Syndication formats.☆56Updated 9 months ago
- list all your starred repositories into a single, markdown-formatted page☆13Updated this week