openzim / python-scraperlibLinks
Collection of Python code to re-use across Python-based scrapers
β24Updated last week
Alternatives and similar repositories for python-scraperlib
Users that are interested in python-scraperlib are comparing it to the libraries listed below
Sorting:
- Command line tool to convert a file in the WARC format to a file in the ZIM formatβ75Updated 8 months ago
- π An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.β15Updated 5 years ago
- Clean a series of links, resolving redirects and finding Wayback results if page is gone. Originally written to aid with importing from Aβ¦β18Updated 2 weeks ago
- π An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.β59Updated last year
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Archβ¦β19Updated last year
- Convert an online sitemap to Atom, RSS and JSON feedsβ61Updated 2 years ago
- Turns a collection of documents into a browsable ZIM fileβ26Updated last month
- ActivityPub server without Javascript, designed for simplicity and accessibility. Includes calendar, news and sharing economy features toβ¦β73Updated this week
- Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.β130Updated 3 months ago
- Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each pageβ¦β40Updated last year
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.β52Updated 2 weeks ago
- Official ArchiveBox MITM proxy: saves URLs of all requests passing through to an ArchiveBox server for archival.β31Updated last year
- Awesome links related to RSS, ATOM, and Syndication formats.β61Updated last year
- Libzim binding for Python: read/write ZIM files in Pythonβ95Updated this week
- Create a ZIM file from a Youtube channel/username/playlistβ82Updated 3 weeks ago
- VLC remote control web interfaceβ18Updated 2 years ago
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.β13Updated last year
- Chrome extension that uses Memento to indicate that a page a user is viewing on the live web has an archived copy and to give the user acβ¦β55Updated 3 months ago
- The ArchiveWeb.page Siteβ30Updated last month
- Community spaces, consent, privacy, transparency, online.β27Updated last week
- Your "yellow pages" of Enterprise Free Software Publishers, their products and success casesβ17Updated last year
- RSS feeds in public.β15Updated last month
- Stupidly simple DIY web archiving toolβ33Updated 9 months ago
- A list of things related to software, literature, and other content for π£ Mementoβ102Updated last year
- Various ZIM command line toolsβ180Updated last month
- anonymous CLI for reading microblogging (chiefly Mastodon) postsβ19Updated last month
- Archiving public telegram messages.β16Updated 3 months ago
- A framework for quick web archiving; canonical repository: https://gitea.arpa.li/JustAnotherArchivist/qwarcβ30Updated 4 years ago
- Specification for the NameDrop DNS delegation protocolβ28Updated 8 months ago
- β90Updated 7 months ago