openzim / python-scraperlibLinks
Collection of Python code to re-use across Python-based scrapers
β24Updated 3 weeks ago
Alternatives and similar repositories for python-scraperlib
Users that are interested in python-scraperlib are comparing it to the libraries listed below
Sorting:
- π An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.β15Updated 5 years ago
- π An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.β59Updated last year
- Clean a series of links, resolving redirects and finding Wayback results if page is gone. Originally written to aid with importing from Aβ¦β18Updated last year
- Convert an online sitemap to Atom, RSS and JSON feedsβ61Updated 2 years ago
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Archβ¦β19Updated last year
- Awesome links related to RSS, ATOM, and Syndication formats.β60Updated last year
- Create a ZIM file from a Youtube channel/username/playlistβ79Updated last week
- ActivityPub server without Javascript, designed for simplicity and accessibility. Includes calendar, news and sharing economy features toβ¦β71Updated this week
- Command line RSS feed reader and json/html/pdf/epub converterβ25Updated 3 years ago
- Community spaces, consent, privacy, transparency, online.β27Updated last month
- Command line tool to convert a file in the WARC format to a file in the ZIM formatβ71Updated 7 months ago
- Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each pageβ¦β40Updated last year
- Gemini IPFS Gatewayβ22Updated 6 months ago
- Turns a collection of documents into a browsable ZIM fileβ26Updated 2 weeks ago
- Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewingβ¦β97Updated last week
- anagora.org/node/agora-botβ22Updated last month
- Chrome extension to add relative bookmarks to your browserβ27Updated 3 years ago
- Farm operated by bots to grow and harvest new zim filesβ118Updated this week
- searchmysite.net is an open source search engine and search as a serviceβ134Updated 3 weeks ago
- This is the HeadQuarters of my digital info. HPI library got me inspired and I'm trying to play with the idea on a smaller scale for myseβ¦β21Updated last year
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.β13Updated last year
- Write and read comments on every page with a simple plug-in for your browserβ49Updated 3 years ago
- Self-hostable link databaseβ125Updated this week
- This is the web portal for Snikket Chat services. To learn more about what Snikket Chat services are, check the website.β42Updated 2 weeks ago
- Offline voice-controlled music player for Raspberry Piβ10Updated last year
- β89Updated 6 months ago
- Mastodon bot flying from user to userβ12Updated 7 years ago
- β26Updated last year
- Kiwix & openZIM build engineβ107Updated last week
- A social media RSS: peer-to-peer, offline ActivityPub client for reading and following microblogs on the Fediverse.β16Updated 10 months ago