ArchiveTeam / grab-siteLinks
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
☆1,536Updated 5 months ago
Alternatives and similar repositories for grab-site
Users that are interested in grab-site are comparing it to the libraries listed below
Sorting:
- Wget-compatible web downloader and crawler.☆595Updated last year
- Core Python Web Archiving Toolkit for replay and recording of web archives☆1,576Updated this week
- Run a high-fidelity browser-based web archiving crawler in a single Docker container☆912Updated this week
- Collect and revisit web pages.☆1,525Updated 10 months ago
- brozzler - distributed browser-based web crawler☆755Updated 2 weeks ago
- Serverless replay of web archives directly in the browser☆856Updated last week
- A Python and Command-Line Interface to Archive.org☆1,785Updated last week
- Use yt-dlp to download video/metadata and upload to the Internet Archive.☆463Updated 2 weeks ago
- ArchiveBot, an IRC bot for archiving websites☆402Updated 3 months ago
- Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2025, WikiTeam has preserved more th…☆802Updated 7 months ago
- List of data-hoarding related tools☆1,239Updated 2 years ago
- An archiving tool with an IM-style interface that prioritizes privacy and accessibility, integrated with various archival services includ…☆2,089Updated last week
- Self-Hosted Bookmark And Archive Manager☆1,828Updated last year
- Indexes open directories☆1,280Updated 2 months ago
- Starting point for archiving entire YouTube channels using yt-dlp (originally youtube-dl)☆508Updated last month
- Lightning-fast file system indexer and search tool☆1,163Updated 4 months ago
- Chrome extension to "Create WARC files from any webpage"☆224Updated last year
- Utilities for dealing with Tumblr blogs, Tumblr backup☆686Updated 9 months ago
- The personal, minimalist, super-fast, database free, bookmarking service - community repo☆3,738Updated 2 months ago
- A curated list of awesome tools for website diffing and change monitoring.☆511Updated 3 weeks ago
- I consume the world via RSS feeds, and this is my attempt to keep it that way.☆802Updated last week
- A Dockerfile for the ArchiveTeam Warrior☆414Updated 2 months ago
- CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)☆1,033Updated last week
- Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matc…☆817Updated 8 months ago
- RSS generator website☆397Updated 2 years ago
- archive reddit data as offline friendly web pages☆174Updated 5 years ago
- 😇 A Docker Compose bundle to run on servers with spare CPU, RAM, disk, and bandwidth to help the world. Includes Tor, ArchiveWarrior, B…☆379Updated 5 months ago