openzim / python-scraperlib
Collection of Python code to re-use across Python-based scrapers
☆21Updated last week
Alternatives and similar repositories for python-scraperlib:
Users that are interested in python-scraperlib are comparing it to the libraries listed below
- Create a ZIM file from a Youtube channel/username/playlist☆61Updated last week
- Turns a collection of documents into a browsable ZIM file☆23Updated 2 months ago
- Command line tool to convert a file in the WARC format to a file in the ZIM format☆49Updated 2 weeks ago
- Kiwix & openZIM build engine☆94Updated last week
- Command line RSS feed reader and json/html/pdf/epub converter☆23Updated 3 years ago
- Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing…☆60Updated this week
- ActivityPub server without Javascript, designed for simplicity and accessibility. Includes calendar, news and sharing economy features to…☆64Updated this week
- Awesome links related to RSS, ATOM, and Syndication formats.☆50Updated 6 months ago
- a tool to snapshot sqlite databases you don't own☆19Updated 3 months ago
- Clean a series of links, resolving redirects and finding Wayback results if page is gone. Originally written to aid with importing from A…☆16Updated 3 months ago
- A powerful tool that converts voice recordings into high-quality Anki flashcards using AI-powered transcription and LLM processing, featu…☆16Updated 3 weeks ago
- Backup easily your system with Bitwarden, BorgBase and Docker☆13Updated 5 months ago
- Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.☆108Updated 3 weeks ago
- Gets your upvoted posts from Hacker News and imports them to raindrop.io☆25Updated last year
- Various ZIM command line tools☆146Updated last week
- ☆89Updated 4 months ago
- freeyourstuff.cc - universal content liberation☆80Updated 2 years ago
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Arch…☆18Updated 11 months ago
- Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page…☆39Updated 4 months ago
- The "hyp.is" service that takes a user to a URL with Hypothesis activated☆49Updated 3 weeks ago
- Flancian's digital garden☆22Updated this week
- Farm operated by bots to grow and harvest new zim files☆91Updated 2 weeks ago
- LibriVox catalog and reader workflow application☆40Updated last month
- list all your starred repositories into a single, markdown-formatted page☆14Updated 3 weeks ago
- Community spaces, consent, privacy, transparency, online.☆21Updated last week
- 🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.☆126Updated last week
- A WebFinger server for Facebook and Twitter.☆20Updated 3 years ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆37Updated 2 weeks ago
- anagora.org/node/agora-bot☆21Updated 3 weeks ago
- Export your Github activity: events, repositories, stars, etc.☆48Updated last year