internetarchive/CDX-Writer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/internetarchive/CDX-Writer)

internetarchive / CDX-Writer

Python script to create CDX index files of WARC data

☆22

Alternatives and similar repositories for CDX-Writer

Users that are interested in CDX-Writer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

evanrmurphy / SweetScript
View on GitHub
A lispy language that compiles into JavaScript, strongly influenced by Arc.
☆14Feb 18, 2011Updated 15 years ago
rajbot / CDX-Writer
View on GitHub
Python script to create CDX index files of WARC data
☆16Sep 7, 2018Updated 7 years ago
ArchiveTeam / terroroftinytown-client-grab
View on GitHub
The Seesaw pipeline grab script for the URLTeam (terroroftinytown) project
☆28Jul 17, 2025Updated last year
internetarchive / dweb-archive
View on GitHub
☆59Jan 6, 2023Updated 3 years ago
sacado / arc2c
View on GitHub
Arc Lisp to C compiler
☆33Aug 13, 2008Updated 17 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
internetarchive / sandcrawler
View on GitHub
Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki
☆28Jul 31, 2024Updated last year
natliblux / warc-safe
View on GitHub
A tool for detecting viruses and NSFW material in WARC files
☆18Updated this week
le-moulin-studio / java-semantic-diff
View on GitHub
Experiment about a semantic-based diff tool for Java language.
☆12Mar 28, 2015Updated 11 years ago
ALIADA / aliada-tool
View on GitHub
Aliada tool implementation
☆37Mar 31, 2017Updated 9 years ago
IonicaBizau / airplane-game
View on GitHub
Two player game to target the opponent airplane.
☆14Feb 13, 2025Updated last year
internetarchive / wayback-diff
View on GitHub
React components to render differences between captures at the Wayback Machine
☆43Jul 6, 2026Updated 2 weeks ago
internetarchive / ia-hadoop-tools
View on GitHub
☆23Feb 22, 2024Updated 2 years ago
Co-dfns / apixlib
View on GitHub
An Image Dictionary for Co-dfns
☆14Jun 16, 2017Updated 9 years ago
coraliefreyja / rainbow
View on GitHub
arc in java
☆59Oct 12, 2010Updated 15 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ArchiveTeam / ludios_wpull
View on GitHub
wpull fork with fixes and faster parsing using html5-parser; used by grab-site; should go away when wpull is similarly improved
☆31Sep 20, 2025Updated 10 months ago
blakemcbride / APLEditor
View on GitHub
APL function editor written in APL
☆12Mar 9, 2026Updated 4 months ago
ikreymer / browsertrix
View on GitHub
(Note: This repository is obsolete, please see the new Browsertrix webrecorder/browsertrix) Browser-Based On-Demand Web Archiving Automat…
☆38Apr 23, 2019Updated 7 years ago
w3c-ccg / hashlink
View on GitHub
An IETF specification for cryptographic hyperlinking
☆15May 2, 2021Updated 5 years ago
benbalter / naughty_or_nice
View on GitHub
You've made the list, we'll help you check it twice. Given a domain-like string, verifies inclusion in a list you provide.
☆19Nov 13, 2020Updated 5 years ago
WikipediaLibrary / externallinks
View on GitHub
External link tracking tool for Wikimedia partnerships
☆11Updated this week
roryk / quantum-diceware
View on GitHub
Diceware random password generation using the ANU quantum random number server as the randomness source
☆17Oct 23, 2018Updated 7 years ago
webrecorder / cdxj-indexer
View on GitHub
CDXJ Indexing of WARC/ARCs
☆35May 11, 2026Updated 2 months ago
w3c / validate-repos
View on GitHub
Test whether W3C spec repos match a set of best practices
☆21Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
typedb / typedb-protocol
View on GitHub
TypeDB (Core and Cloud) RPC Communication Protocol
☆18Jul 6, 2026Updated 2 weeks ago
w3c-ccg / lds-jws2020
View on GitHub
Linked Data Signatures for JWS
☆13Aug 5, 2022Updated 3 years ago
MichielDeMey / delijn-api
View on GitHub
RESTful API documentation for De Lijn
☆10Jul 4, 2015Updated 11 years ago
oduwsdl / MementoEmbed
View on GitHub
A service that provides archive-aware oEmbed-compatible embeddable surrogates (social cards, thumbnails, etc.) for archived web pages (me…
☆14Nov 15, 2021Updated 4 years ago
oduwsdl / QueryClassification
View on GitHub
Source code for domain classification (scholar or non-scholar) of a web query.
☆11May 31, 2016Updated 10 years ago
EFForg / https-everywhere-lib-wasm
View on GitHub
A library for HTTPS Everywhere which compiles to WASM
☆16Feb 3, 2021Updated 5 years ago
library-ucsb / metadata-ci
View on GitHub
CI scripts for validating and processing metadata
☆11Dec 7, 2019Updated 6 years ago
markrwilliams / bfa
View on GitHub
Builders for attrs
☆11Jul 31, 2019Updated 6 years ago
michelebucelli / game-off-2016
View on GitHub
A javascript coding game
☆13Nov 30, 2016Updated 9 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
nasa / WinASSIST
View on GitHub
☆10Jan 21, 2016Updated 10 years ago
spatie / bpost-address-webservice
View on GitHub
An API wrapper for bpost's address webservice
☆18Jan 11, 2024Updated 2 years ago
internetarchive / dweb-gateway
View on GitHub
Decentralized web Gateway for Internet Archive
☆20Jan 4, 2020Updated 6 years ago
internetarchive / warctools
View on GitHub
Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)
☆176Aug 18, 2025Updated 11 months ago
PerformanceHorizonGroup / apidocs
View on GitHub
Docs for the public Partnerize API
☆12May 7, 2021Updated 5 years ago
iipc / jwarc
View on GitHub
Java library for reading and writing WARC files with a typed API
☆60Jun 27, 2026Updated 3 weeks ago
simeonf / sqlalchemy-tutorial
View on GitHub
Sample code and slides for an SQL Alchemy ORM tutorial
☆13Jan 22, 2015Updated 11 years ago