JustAnotherArchivist / qwarcLinks
A framework for quick web archiving; canonical repository: https://gitea.arpa.li/JustAnotherArchivist/qwarc
☆29Updated 4 years ago
Alternatives and similar repositories for qwarc
Users that are interested in qwarc are comparing it to the libraries listed below
Sorting:
- Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.☆130Updated last month
- A simple Python wrapper and command-line interface for archive.org’s "Save Page Now" capturing service☆185Updated last year
- Nondestructive warc-in-tar to warc conversion☆27Updated 12 years ago
- Support for writing WARC files with Scrapy☆22Updated 5 years ago
- A command line tool to archive a git repository from GitHub to the Internet Archive.☆91Updated 4 years ago
- Python 3 tools for downloading and preserving wikis☆121Updated 3 months ago
- Wombat.js client-side rewriting library☆106Updated last month
- 🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.☆177Updated last month
- Libzim binding for Python: read/write ZIM files in Python☆94Updated last month
- Calculate PhotoDNA hashes using Python☆43Updated 6 months ago
- Archiving URLs (outlinks) from a variety of sources.☆23Updated last week
- A helper library full of URL-related heuristics.☆73Updated 3 weeks ago
- A configurable, reusable tracker with dashboard☆36Updated last year
- Archiving public telegram messages.☆15Updated last month
- The little things give you away... A collection of various small helper stuff – Mirror repo only, no longer kept in sync, refer to gitea.…☆24Updated 5 years ago
- Loadable spellfix1 extension for sqlite as python package☆26Updated last year
- Saving all questions and answers from Yahoo! Answers.☆50Updated 4 years ago
- Archiving GitHub☆10Updated 2 months ago
- Scripts to build and boot warrior virtual machine containing Docker☆120Updated 6 months ago
- Scrape Twitter API without authentication using Nitter.☆65Updated 2 years ago
- Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives☆15Updated 4 years ago
- [Moved to Forgejo] An S3-to-S3 proxy (and more) implementing file-level deduplication and access control.☆27Updated 2 months ago
- Create "perfect" snapshots of web pages☆33Updated 2 months ago
- ☆63Updated 9 months ago
- Farm operated by bots to grow and harvest new zim files☆116Updated this week
- Python wrapper for the MediaWiki API to access and parse data from Wikipedia☆42Updated last month
- Python API for neocities.org☆82Updated last year
- Code to reproduce the Hacker News users fingerprinting with Burrows method☆54Updated 6 months ago
- A polite and user-friendly downloader for Common Crawl data☆56Updated 2 months ago
- Discord archiver☆65Updated 2 years ago