internetarchive/warc

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/internetarchive/warc)

internetarchive / warc

Python library for reading and writing warc files

☆249

Alternatives and similar repositories for warc

Users that are interested in warc are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

internetarchive / warctools
View on GitHub
Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)
☆176Aug 18, 2025Updated 11 months ago
internetarchive / warcprox
View on GitHub
WARC writing MITM HTTP/S proxy
☆456Jun 17, 2026Updated last month
internetarchive / CDX-Writer
View on GitHub
Python script to create CDX index files of WARC data
☆22Sep 4, 2025Updated 10 months ago
webrecorder / warcio
View on GitHub
Streaming WARC/ARC library for fast web archive IO
☆462Jun 10, 2026Updated last month
commoncrawl / gzipstream
View on GitHub
gzipstream allows Python to process multi-part gzip files from a streaming source
☆23Feb 24, 2017Updated 9 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
ikreymer / webarchiveplayer
View on GitHub
NOTE: This project is no longer being actively developed.. Check out https://replayweb.page / https://github.com/webrecorder/replayweb.pa…
☆203Jan 22, 2025Updated last year
rajbot / CDX-Writer
View on GitHub
Python script to create CDX index files of WARC data
☆16Sep 7, 2018Updated 7 years ago
webrecorder / pywb
View on GitHub
Core Python Web Archiving Toolkit for replay and recording of web archives
☆1,682Apr 10, 2026Updated 3 months ago
webrecorder / warcit
View on GitHub
Convert Directories, Files and ZIP Files to Web Archives (WARC)
☆99Apr 22, 2025Updated last year
Ahnfelt / AlgorithmWStepByStep
View on GitHub
Type inference for ML-like languages. A port to F# of "Algorithm W Step by Step" by Martin Grabmüller.
☆11Sep 17, 2014Updated 11 years ago
xu-hao / QueryArrow
View on GitHub
A semantically unified SQL and NoSQL query and update system
☆18Jan 20, 2019Updated 7 years ago
webyrd / declarative-semantics
View on GitHub
miniKanren implementation of ' Declarative semantics for functional languages: compositional, extensional, and elementary' by Jeremy Siek…
☆15Mar 16, 2018Updated 8 years ago
leppert / omniauth-pocket
View on GitHub
An Omniauth Strategy for Pocket
☆15Mar 5, 2017Updated 9 years ago
alard / megawarc
View on GitHub
Nondestructive warc-in-tar to warc conversion
☆27Apr 21, 2013Updated 13 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
dhamaniasad / WARCTools
View on GitHub
A list of tools related to W(eb)ARC(hive)
☆71Nov 1, 2014Updated 11 years ago
odie5533 / WarcProxy
View on GitHub
Saves proxied HTTP traffic to a WARC file.
☆28Oct 22, 2013Updated 12 years ago
ArchiveTeam / grab-site
View on GitHub
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
☆1,601May 23, 2025Updated last year
recrm / ArchiveTools
View on GitHub
A collection of tools for archiving and analysing the internet.
☆79Jul 6, 2022Updated 4 years ago
chfoo / warcat
View on GitHub
Tool and library for handling Web ARChive (WARC) files.
☆165Oct 11, 2024Updated last year
tsikov / vcr
View on GitHub
Store and replay results of http calls for easier testing of external services.
☆14Jun 17, 2017Updated 9 years ago
gregr / dKanren
View on GitHub
miniKanren variant with a functional syntax, expressing disjunction via pattern matching
☆17Mar 28, 2020Updated 6 years ago
internetarchive / wayback-diff
View on GitHub
React components to render differences between captures at the Wayback Machine
☆43Jul 6, 2026Updated 2 weeks ago
esmero / archipelago-documentation
View on GitHub
Archipelago Commons' ever evolving Documentation Repository
☆26Jun 23, 2026Updated 3 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Rhizome-Conifer / conifer
View on GitHub
Collect and revisit web pages.
☆1,542May 12, 2026Updated 2 months ago
helgeho / ArchiveSpark
View on GitHub
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed…
☆161Oct 8, 2025Updated 9 months ago
silky / super-reference
View on GitHub
Web-based reference manager, written in Haskell.
☆31May 7, 2016Updated 10 years ago
internetarchive / heritrix3
View on GitHub
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
☆3,283Jul 15, 2026Updated last week
internetarchive / brozzler
View on GitHub
brozzler - distributed browser-based web crawler
☆809Jul 7, 2026Updated 2 weeks ago
sergey-pashaev / helm-cscope
View on GitHub
Use xcscope with helm!
☆12Jan 10, 2015Updated 11 years ago
Shinmera / parasol
View on GitHub
A Common Lisp painting application
☆32May 17, 2026Updated 2 months ago
tmbdev-archive / archivefs
View on GitHub
An archival and backup file system for Linux using FUSE.
☆25Jan 22, 2017Updated 9 years ago
gwu-libraries / social-feed-manager
View on GitHub
"Old SFM" -- manage rules and streams from social data sources, starting with twitter.
☆86Aug 10, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ArchiveTeam / ArchiveBot
View on GitHub
ArchiveBot, an IRC bot for archiving websites
☆418Apr 17, 2026Updated 3 months ago
machawk1 / warcreate
View on GitHub
Chrome extension to "Create WARC files from any webpage"
☆229Dec 5, 2025Updated 7 months ago
Data-Horde / ytcc-archive
View on GitHub
archiving community contributions on YouTube: unpublished captions, title and description translations and caption credits
☆11Oct 29, 2020Updated 5 years ago
ukwa / webarchive-discovery
View on GitHub
Please note that the warc-indexer tool & code is now supported by NetArchiveSuite. The 'warc-indexer' directory and code that exists in t…
☆133Nov 21, 2025Updated 8 months ago
DocNow / awesome-social-media-archiving
View on GitHub
Tools for helping you work with web platform archive downloads.
☆18Mar 27, 2020Updated 6 years ago
vinaygoel / archive-analysis
View on GitHub
Tools to analyze web archives
☆20Jul 12, 2016Updated 10 years ago
alard / wget-lua
View on GitHub
Wget with Lua extension
☆24Dec 17, 2015Updated 10 years ago