ArchiveTeam / grab-site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
☆1,473Updated 9 months ago
Alternatives and similar repositories for grab-site:
Users that are interested in grab-site are comparing it to the libraries listed below
- Wget-compatible web downloader and crawler.☆582Updated 11 months ago
- Core Python Web Archiving Toolkit for replay and recording of web archives☆1,486Updated last week
- brozzler - distributed browser-based web crawler☆699Updated 2 weeks ago
- Collect and revisit web pages.☆1,497Updated 3 months ago
- A Python and Command-Line Interface to Archive.org☆1,695Updated 3 weeks ago
- An Awesome List for getting started with web archiving☆2,216Updated last week
- An archiving tool with an IM-style interface that prioritizes privacy and accessibility, integrated with various archival services includ…☆1,931Updated this week
- Run a high-fidelity browser-based web archiving crawler in a single Docker container☆748Updated this week
- ArchiveBot, an IRC bot for archiving websites☆380Updated last week
- Serverless replay of web archives directly in the browser☆784Updated last month
- Use yt-dlp to download video/metadata and upload to the Internet Archive.☆446Updated 2 weeks ago
- Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2025, WikiTeam has preserved more th…☆761Updated 3 weeks ago
- WARC writing MITM HTTP/S proxy☆401Updated 2 weeks ago
- List of data-hoarding related tools☆1,157Updated last year
- Self-Hosted Bookmark And Archive Manager☆1,806Updated 11 months ago
- Make a ZIM file from any Web site and surf offline!☆503Updated this week
- Lightning-fast file system indexer and search tool☆1,031Updated 3 weeks ago
- CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)☆763Updated 2 weeks ago
- Extremely fast tool to remove duplicates and other lint from your filesystem☆2,069Updated 3 weeks ago
- Offline Internet Archive project☆286Updated last year
- 💾 dn - offline full-text search and archiving for your Chromium-based browser.☆3,830Updated last month
- Download an entire website from the Wayback Machine.☆5,529Updated last year
- Efficient Duplicate File Finder☆2,249Updated last month
- Official repo for par2cmdline and libpar2☆760Updated this week
- The personal, minimalist, super-fast, database free, bookmarking service - community repo☆3,611Updated last week
- find duplicate files utility☆1,067Updated last month
- Utilities for dealing with Tumblr blogs, Tumblr backup☆679Updated 2 months ago
- A self-hosted, anti-social RSS reader.☆4,007Updated this week
- Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.☆297Updated 3 weeks ago
- Starting point for archiving entire YouTube channels using yt-dlp (originally youtube-dl)☆499Updated 2 years ago