openzim / python-scraperlibLinks
Collection of Python code to re-use across Python-based scrapers
☆25Updated 4 months ago
Alternatives and similar repositories for python-scraperlib
Users that are interested in python-scraperlib are comparing it to the libraries listed below
Sorting:
- Clean a series of links, resolving redirects and finding Wayback results if page is gone. Originally written to aid with importing from A…☆18Updated 11 months ago
- ActivityPub server without Javascript, designed for simplicity and accessibility. Includes calendar, news and sharing economy features to…☆71Updated last week
- Turns a collection of documents into a browsable ZIM file☆26Updated 6 months ago
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆15Updated 4 years ago
- Create a ZIM file from a Youtube channel/username/playlist☆78Updated last week
- Awesome links related to RSS, ATOM, and Syndication formats.☆59Updated last year
- Command line tool to convert a file in the WARC format to a file in the ZIM format☆70Updated 5 months ago
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Arch…☆19Updated last year
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆58Updated last year
- Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.☆129Updated 3 weeks ago
- Convert an online sitemap to Atom, RSS and JSON feeds☆61Updated last year
- Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page…☆40Updated last year
- Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing…☆90Updated last month
- Mastodon bot flying from user to user☆12Updated 7 years ago
- A list of things related to software, literature, and other content for 🕣 Memento☆99Updated last year
- ⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 …☆82Updated 3 weeks ago
- Official ArchiveBox MITM proxy: saves URLs of all requests passing through to an ArchiveBox server for archival.☆29Updated last year
- searchmysite.net is an open source search engine and search as a service☆134Updated last week
- Command line RSS feed reader and json/html/pdf/epub converter☆25Updated 3 years ago
- Kiwix & openZIM build engine☆103Updated 2 months ago
- RSS feeds in public.☆14Updated 4 months ago
- Your "yellow pages" of Enterprise Free Software Publishers, their products and success cases☆17Updated last year
- Homebrew formula for the ArchiveBox self-hosted internet archiving solution.☆29Updated 11 months ago
- HTML tables are underrated☆21Updated 2 weeks ago
- Submit websites to be crawled by Marginalia Search here☆51Updated last week
- Application to communicate with SEPIA via browser, iOS and Android. Works as chat messenger with personal-assistant, ASR and TTS integrat…☆67Updated 5 months ago
- A social media RSS: peer-to-peer, offline ActivityPub client for reading and following microblogs on the Fediverse.☆15Updated 9 months ago
- Specification for the NameDrop DNS delegation protocol☆26Updated 5 months ago
- Gemini IPFS Gateway☆22Updated 4 months ago
- YOLOv9 object and fire detection for IP security cameras in Python3☆19Updated 11 months ago