alard/warc-proxy

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/alard/warc-proxy)

alard / warc-proxy

Serving content from a WARC

☆61

Alternatives and similar repositories for warc-proxy

Users that are interested in warc-proxy are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

odie5533 / WarcMiddleware
View on GitHub
WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.
☆48Mar 19, 2018Updated 8 years ago
maturban / WARCMerge
View on GitHub
Merging WARCs into a single WARC file
☆15Aug 29, 2014Updated 11 years ago
web-archive-group / heritrix-walkthrough
View on GitHub
☆10Jun 10, 2016Updated 10 years ago
odie5533 / WarcProxy
View on GitHub
Saves proxied HTTP traffic to a WARC file.
☆28Oct 22, 2013Updated 12 years ago
mjordan / GitBags
View on GitHub
Some ideas on making Bags into Git repositories
☆16Dec 23, 2014Updated 11 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ukwa / shine
View on GitHub
Prototype SOLR-powered web archive exploration UI.
☆43Jun 3, 2020Updated 6 years ago
gwu-libraries / TweetSets
View on GitHub
Service for creating Twitter datasets for research and archiving.
☆26Dec 7, 2022Updated 3 years ago
odie5533 / WarcMITMProxy
View on GitHub
HTTP(S) proxy that saves traffic to a WARC file, using libmitmproxy.
☆16Oct 25, 2013Updated 12 years ago
iipc / warc-specifications
View on GitHub
Centralised repository for WARC usage specifications.
☆129Apr 4, 2026Updated 3 months ago
web-archive-group / hackathon
View on GitHub
☆14Feb 28, 2017Updated 9 years ago
archivesunleashed / graphpass
View on GitHub
GraphPass is a utility to filter networks and provide a default visualization output for Gephi or SigmaJS.
☆17Nov 14, 2020Updated 5 years ago
alard / megawarc
View on GitHub
Nondestructive warc-in-tar to warc conversion
☆27Apr 21, 2013Updated 13 years ago
internetarchive / arch
View on GitHub
Web application for distributed compute analysis of Archive-It web archive collections.
☆20Mar 24, 2026Updated 3 months ago
archivesunleashed / docker-aut
View on GitHub
Docker image for the Archives Unleashed Toolkit
☆12Nov 17, 2022Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
internetarchive / umbra
View on GitHub
A queue-controlled browser automation tool for improving web crawl quality
☆68May 28, 2026Updated last month
vinaygoel / archive-analysis
View on GitHub
Tools to analyze web archives
☆20Jul 12, 2016Updated 10 years ago
UAlbanyArchives / describingWebArchives
View on GitHub
Automating description for Web Archives in ArchivesSpace using the Archive-It CDX and Partner Data APIs
☆11Aug 10, 2018Updated 7 years ago
internetarchive / surt
View on GitHub
Sort-friendly URI Reordering Transform (SURT) python module
☆45Sep 11, 2025Updated 10 months ago
webrecorder / warcit
View on GitHub
Convert Directories, Files and ZIP Files to Web Archives (WARC)
☆99Apr 22, 2025Updated last year
oduwsdl / ORS
View on GitHub
Object Resource Stream and CDXJ Drafts
☆15Nov 28, 2018Updated 7 years ago
lintool / warcbase
View on GitHub
Warcbase is an open-source platform for managing analyzing web archives
☆162Dec 8, 2017Updated 8 years ago
edsu / memento-cli
View on GitHub
A command line utility for listing and searching snapshots in web archives
☆20Jun 4, 2026Updated last month
roryk / quantum-diceware
View on GitHub
Diceware random password generation using the ANU quantum random number server as the randomness source
☆17Oct 23, 2018Updated 7 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
vinaygoel / ars-workshop
View on GitHub
Archive Research Services Workshop
☆31Sep 29, 2017Updated 8 years ago
iipc / webarchive-commons
View on GitHub
Common web archive utility code.
☆65Jul 3, 2026Updated 2 weeks ago
webis-de / wasp
View on GitHub
☆28Jun 30, 2026Updated 3 weeks ago
mjordan / ocr_rest
View on GitHub
A simple OCR service over REST
☆15Jul 29, 2014Updated 11 years ago
ikreymer / pywb-webrecorder
View on GitHub
Check out https://github.com/webrecorder/webrecorder for newer version matching https://webrecorder.io
☆38Oct 16, 2015Updated 10 years ago
helgeho / ArchiveSpark
View on GitHub
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed…
☆161Oct 8, 2025Updated 9 months ago
harvard-lil / waczerciser
View on GitHub
Create and edit WARC and WACZ files
☆29Dec 6, 2024Updated last year
LibraryOfCongress / coding-standards
View on GitHub
Library of Congress coding standards
☆32Jun 17, 2024Updated 2 years ago
ahankinson / pybagit
View on GitHub
Python library for manipulating bagit files.
☆20Feb 6, 2019Updated 7 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
iipc / twittervane
View on GitHub
Using social media to steer web archiving and curation.
☆18Nov 20, 2015Updated 10 years ago
rajbot / CDX-Writer
View on GitHub
Python script to create CDX index files of WARC data
☆16Sep 7, 2018Updated 7 years ago
unt-libraries / py-wasapi-client
View on GitHub
A client for the Archive-It And Webrecorder WASAPI Data Transfer API
☆16Oct 18, 2019Updated 6 years ago
WASAPI-Community / data-transfer-apis
View on GitHub
WASAPI data transfer APIs
☆50Apr 23, 2022Updated 4 years ago
DocNow / waybackprov
View on GitHub
utility to fetch provenance information from Internet Archive's Wayback Machine
☆15Feb 5, 2026Updated 5 months ago
yasmina85 / OffTopic-Detection
View on GitHub
This repository contains tool and collections dataset for detecting off-topic pages from Web archived collections.
☆17Aug 20, 2015Updated 10 years ago
rectalogic / xbmc-fancast-plugin
View on GitHub
Plugin for XBMC/Boxee Fancast support
☆27Mar 3, 2009Updated 17 years ago