ArchiveTeam/wpull

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ArchiveTeam/wpull)

ArchiveTeam / wpull

Wget-compatible web downloader and crawler.

☆611

Alternatives and similar repositories for wpull

Users that are interested in wpull are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ArchiveTeam / grab-site
View on GitHub
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
☆1,601May 23, 2025Updated last year
ArchiveTeam / ArchiveBot
View on GitHub
ArchiveBot, an IRC bot for archiving websites
☆418Apr 17, 2026Updated 3 months ago
ArchiveTeam / NewsGrabber
View on GitHub
Grabbing all news.
☆60Dec 23, 2019Updated 6 years ago
ArchiveTeam / ludios_wpull
View on GitHub
wpull fork with fixes and faster parsing using html5-parser; used by grab-site; should go away when wpull is similarly improved
☆31Sep 20, 2025Updated 10 months ago
internetarchive / warcprox
View on GitHub
WARC writing MITM HTTP/S proxy
☆456Jun 17, 2026Updated last month
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
chfoo / warcat
View on GitHub
Tool and library for handling Web ARChive (WARC) files.
☆165Oct 11, 2024Updated last year
webrecorder / pywb
View on GitHub
Core Python Web Archiving Toolkit for replay and recording of web archives
☆1,680Apr 10, 2026Updated 3 months ago
internetarchive / brozzler
View on GitHub
brozzler - distributed browser-based web crawler
☆809Jul 7, 2026Updated last week
Rhizome-Conifer / conifer
View on GitHub
Collect and revisit web pages.
☆1,542May 12, 2026Updated 2 months ago
ArchiveTeam / wget-lua
View on GitHub
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
☆137Mar 19, 2026Updated 4 months ago
webrecorder / webrecorder-player
View on GitHub
Webrecorder Player for Desktop (OSX/Windows/Linux). (Built with Electron + Webrecorder)
☆445Sep 17, 2020Updated 5 years ago
ArchiveTeam / telegram-grab
View on GitHub
Archiving public telegram messages.
☆17Jul 5, 2026Updated 2 weeks ago
iipc / awesome-web-archiving
View on GitHub
An Awesome List for getting started with web archiving
☆2,605Apr 27, 2026Updated 2 months ago
odie5533 / WarcMiddleware
View on GitHub
WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.
☆48Mar 19, 2018Updated 8 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
iipc / openwayback
View on GitHub
The OpenWayback Development
☆521Jan 3, 2024Updated 2 years ago
PromyLOPh / crocoite
View on GitHub
Web archiving using Google Chrome
☆45Dec 30, 2019Updated 6 years ago
webrecorder / warcit
View on GitHub
Convert Directories, Files and ZIP Files to Web Archives (WARC)
☆99Apr 22, 2025Updated last year
alard / megawarc
View on GitHub
Nondestructive warc-in-tar to warc conversion
☆27Apr 21, 2013Updated 13 years ago
ikreymer / webarchiveplayer
View on GitHub
NOTE: This project is no longer being actively developed.. Check out https://replayweb.page / https://github.com/webrecorder/replayweb.pa…
☆203Jan 22, 2025Updated last year
alard / warc-proxy
View on GitHub
Serving content from a WARC
☆61Jan 5, 2013Updated 13 years ago
webrecorder / warcio
View on GitHub
Streaming WARC/ARC library for fast web archive IO
☆461Jun 10, 2026Updated last month
eugeneware / warc
View on GitHub
Parse WARC (Web Archive Files) as a node.js stream
☆23Oct 20, 2014Updated 11 years ago
jjjake / internetarchive
View on GitHub
A Python and Command-Line Interface to Archive.org
☆1,883Jul 6, 2026Updated 2 weeks ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
iipc / warc-specifications
View on GitHub
Centralised repository for WARC usage specifications.
☆129Apr 4, 2026Updated 3 months ago
ArchiveTeam / urls-sources
View on GitHub
Sources for urls-grab.
☆14Jun 20, 2026Updated last month
ArchiveTeam / seesaw-kit
View on GitHub
Making a reusable toolkit for writing seesaw scripts
☆75Updated this week
N0taN3rd / Squidwarc
View on GitHub
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
☆178May 19, 2020Updated 6 years ago
bibanon / tubeup
View on GitHub
Use yt-dlp to download video/metadata and upload to the Internet Archive.
☆509May 8, 2026Updated 2 months ago
webrecorder / har2warc
View on GitHub
Convert HTTP Archive (HAR) -> Web Archive (WARC) format
☆55Oct 21, 2018Updated 7 years ago
machawk1 / warcreate
View on GitHub
Chrome extension to "Create WARC files from any webpage"
☆229Dec 5, 2025Updated 7 months ago
machawk1 / wail
View on GitHub
Web Archiving Integration Layer: One-Click User Instigated Preservation
☆398Jun 19, 2026Updated last month
webrecorder / browsertrix-crawler
View on GitHub
Run a high-fidelity browser-based web archiving crawler in a single Docker container
☆1,087Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ArchiveTeam / IA.BAK
View on GitHub
We back up a lot of stuff from around the web; now it's time to back up the Internet Archive, just in case.
☆93Jul 13, 2020Updated 6 years ago
internetarchive / warctools
View on GitHub
Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)
☆176Aug 18, 2025Updated 11 months ago
oduwsdl / archivenow
View on GitHub
A Tool To Push Web Resources Into Web Archives
☆434Jan 23, 2024Updated 2 years ago
rogerhoward / lambdazoom
View on GitHub
LambdaZoom is a Python-based AWS Lambda function which converts uploaded images to the Deep Zoom tiled image format supported by OpenSead…
☆10Feb 4, 2022Updated 4 years ago
peterk / warcworker
View on GitHub
A dockerized, queued high fidelity web archiver based on Squidwarc
☆62Jul 9, 2024Updated 2 years ago
JustAnotherArchivist / little-things
View on GitHub
The little things give you away... A collection of various small helper stuff – Mirror repo only, no longer kept in sync, refer to gitea.…
☆24Sep 11, 2020Updated 5 years ago
internetarchive / heritrix3
View on GitHub
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
☆3,280Updated this week