odie5533/WarcMiddleware

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/odie5533/WarcMiddleware)

odie5533 / WarcMiddleware

WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.

☆48

Alternatives and similar repositories for WarcMiddleware

Users that are interested in WarcMiddleware are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

entaxy-project / entaxy
View on GitHub
Your Personal Finance Simple & Private https://entaxy.io
☆10Feb 4, 2026Updated 5 months ago
odie5533 / WarcMITMProxy
View on GitHub
HTTP(S) proxy that saves traffic to a WARC file, using libmitmproxy.
☆16Oct 25, 2013Updated 12 years ago
gotlium / django-proxylist
View on GitHub
Proxy-list management application for Django
☆23Mar 5, 2018Updated 8 years ago
Famicoman / ia-ul-from-youtubedl
View on GitHub
Uploads items into the Internet Archive after they have been downloaded with youtube-dl
☆15Feb 28, 2015Updated 11 years ago
ArchiveTeam / wpull
View on GitHub
Wget-compatible web downloader and crawler.
☆612Apr 29, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
bloomark / f13x
View on GitHub
HyperDex, Flask Cryptocurrency Trading Platform, Exchange
☆10Aug 13, 2015Updated 10 years ago
TeamHG-Memex / MaybeDont
View on GitHub
A component that tries to avoid downloading duplicate content
☆28Apr 8, 2026Updated 3 months ago
chfoo / warcat
View on GitHub
Tool and library for handling Web ARChive (WARC) files.
☆165Oct 11, 2024Updated last year
INA-DLWeb / LiveArchivingProxy
View on GitHub
An HTTP Proxy that archives all intercepted traffic.
☆21Aug 26, 2014Updated 11 years ago
povilasb / scrapy-html-storage
View on GitHub
Scrapy downloader middleware that stores response HTMLs to disk.
☆18Apr 14, 2026Updated 3 months ago
cryptoapi / Bitcoin-Easy-Digital-Downloads
View on GitHub
Free Bitcoin Payment Gateway Addon for Wordpress Easy Digital Downloads 2.4+ (or higher). Accept Bitcoin, DASH, Litecoin, Dogecoin, Speed…
☆13Aug 4, 2017Updated 8 years ago
AnCh7 / FinancialCharting
View on GitHub
Stock charts. Application uses ExtJS, HighStock, ServiceStack (with swagger and caching) and Quandl.com (datafeed).
☆16Feb 4, 2015Updated 11 years ago
aGHz / structominer
View on GitHub
Data scraping for a more civilized age
☆17Jun 12, 2014Updated 12 years ago
BCAPtoken / BCAPToken
View on GitHub
Smart contract of BlockChain Capital Ethereum token
☆13Apr 27, 2017Updated 9 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
mogira / MP4Pythonista
View on GitHub
Music Player for "Pythonista for iOS"
☆10May 13, 2017Updated 9 years ago
TeamHG-Memex / extract-html-diff
View on GitHub
extract difference between two html pages
☆33Apr 8, 2026Updated 3 months ago
wcong / ants
View on GitHub
open source, distributed, restful crawler engine
☆14Feb 3, 2015Updated 11 years ago
hugsbrugs / scrapyd-webapp
View on GitHub
Scrapyd web application for managing projects, spiders and visualize jobs logs and items
☆14Jan 19, 2016Updated 10 years ago
themiurgo / twitterstream-downloader
View on GitHub
Twitter stream and social network crawling tools
☆17Nov 17, 2016Updated 9 years ago
ikreymer / webarchive-indexing
View on GitHub
Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.
☆46Dec 4, 2017Updated 8 years ago
datadesk / nws-wwa
View on GitHub
Download watch, warning and advisory data from the National Weather Service
☆12Updated this week
TeamHG-Memex / tor-proxy
View on GitHub
a tor socks proxy docker image
☆12Apr 8, 2026Updated 3 months ago
lvanderree / feincms-mercury
View on GitHub
Mercury HTML5 WYSIWYG editor for Django - FeinCMS
☆11Oct 19, 2011Updated 14 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
peterwaksman / Narwhal
View on GitHub
Narwhal is a keyword and KEY NARRATIVE manager that creates language-aware classes. Because Narhwal does not use NLP it avoids complexity…
☆12Oct 16, 2018Updated 7 years ago
chandler-stimson / video-player
View on GitHub
A web-based video player with playback speed rate control, speed boosting, and more
☆21Jul 27, 2024Updated last year
webrecorder / pywb
View on GitHub
Core Python Web Archiving Toolkit for replay and recording of web archives
☆1,683Apr 10, 2026Updated 3 months ago
peterk / warcworker
View on GitHub
A dockerized, queued high fidelity web archiver based on Squidwarc
☆62Jul 9, 2024Updated 2 years ago
alard / megawarc
View on GitHub
Nondestructive warc-in-tar to warc conversion
☆27Apr 21, 2013Updated 13 years ago
kpeterson85 / Salesforce-Enhanced-Formula-Editor-Chrome-Extension
View on GitHub
Enhances Salesforce formula textareas with the EditArea code editor and provides a "Load Field Details" button that list details about th…
☆13Oct 13, 2024Updated last year
slyrz / warc
View on GitHub
Read and write WARC files in Go
☆50Apr 9, 2018Updated 8 years ago
trickvi / datapackage
View on GitHub
Manage and load dataprotocols.org Data Packages
☆27Sep 17, 2015Updated 10 years ago
umputun / mongo-auth
View on GitHub
mongo docker with auth
☆12Jul 24, 2018Updated 8 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
nalomran / pubmed2doc
View on GitHub
Write PubMed search results with two display options (citation or listview) to PDF or Word
☆13Oct 18, 2020Updated 5 years ago
bkjones / django-taxonomy
View on GitHub
A django app to support whatever classification type (category, label, tag) you can dream up.
☆58Jun 17, 2019Updated 7 years ago
internetarchive / warctools
View on GitHub
Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)
☆176Aug 18, 2025Updated 11 months ago
dchest / pyblake2
View on GitHub
Python extension module implementing BLAKE2 hash function
☆39Sep 14, 2020Updated 5 years ago
internetarchive / trough
View on GitHub
Trough: Big data, small databases.
☆43Jul 25, 2024Updated last year
InviteBox / hoarder
View on GitHub
Django analytics kit for the data-obsessed.
☆18Feb 3, 2013Updated 13 years ago
danawoodman / django-flatpages-plus
View on GitHub
[NOT MAINTAINED] A more robust flatpages app for Django.
☆17Jan 15, 2013Updated 13 years ago