Python script to create CDX index files of WARC data
☆21Sep 4, 2025Updated 6 months ago
Alternatives and similar repositories for CDX-Writer
Users that are interested in CDX-Writer are comparing it to the libraries listed below
Sorting:
- Python script to create CDX index files of WARC data☆16Sep 7, 2018Updated 7 years ago
- A lispy language that compiles into JavaScript, strongly influenced by Arc.☆14Feb 18, 2011Updated 15 years ago
- Arc Lisp to C compiler☆33Aug 13, 2008Updated 17 years ago
- Python library for reading and writing warc files☆248Mar 7, 2022Updated 4 years ago
- The Seesaw pipeline grab script for the URLTeam (terroroftinytown) project☆28Jul 17, 2025Updated 8 months ago
- arc in java☆59Oct 12, 2010Updated 15 years ago
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆28Jul 31, 2024Updated last year
- Editor for New Super Mario Bros. Wii data files☆66Nov 24, 2012Updated 13 years ago
- A tool for detecting viruses and NSFW material in WARC files☆18Dec 16, 2025Updated 3 months ago
- React components to render differences between captures at the Wayback Machine☆41Updated this week
- An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed…☆157Oct 8, 2025Updated 5 months ago
- code and data used to build a training dataset for dragnet models☆10Nov 29, 2020Updated 5 years ago
- wpull fork with fixes and faster parsing using html5-parser; used by grab-site; should go away when wpull is similarly improved☆30Sep 20, 2025Updated 6 months ago
- IPLD Schema Implementation: parser and utilities☆16Mar 6, 2026Updated 2 weeks ago
- (Note: This repository is obsolete, please see the new Browsertrix webrecorder/browsertrix) Browser-Based On-Demand Web Archiving Automat…☆38Apr 23, 2019Updated 6 years ago
- You've made the list, we'll help you check it twice. Given a domain-like string, verifies inclusion in a list you provide.☆19Nov 13, 2020Updated 5 years ago
- External link tracking tool for Wikimedia partnerships☆11Oct 3, 2025Updated 5 months ago
- MIMO platform for advanced communications and PNT applications☆14Dec 8, 2014Updated 11 years ago
- Test whether W3C spec repos match a set of best practices☆21Updated this week
- Please note that the warc-indexer tool & code is now supported by NetArchiveSuite. The 'warc-indexer' directory and code that exists in t…☆132Nov 21, 2025Updated 4 months ago
- A service that provides archive-aware oEmbed-compatible embeddable surrogates (social cards, thumbnails, etc.) for archived web pages (me…☆14Nov 15, 2021Updated 4 years ago
- Source code for domain classification (scholar or non-scholar) of a web query.☆11May 31, 2016Updated 9 years ago
- ☆20Dec 5, 2017Updated 8 years ago
- A library for HTTPS Everywhere which compiles to WASM☆16Feb 3, 2021Updated 5 years ago
- Libro "El camino a un mejor programador"☆19Mar 1, 2013Updated 13 years ago
- Jupyter Notebooks Relating to Open Context (https://opencontext.org)☆11Oct 14, 2025Updated 5 months ago
- Trough: Big data, small databases.☆42Jul 25, 2024Updated last year
- CI scripts for validating and processing metadata☆11Dec 7, 2019Updated 6 years ago
- Proposed architecture for a Solid server☆13Aug 21, 2020Updated 5 years ago
- Builders for attrs☆11Jul 31, 2019Updated 6 years ago
- Kaitai Struct YAML (KSY) schema specification☆15Sep 12, 2025Updated 6 months ago
- ANNSER is A Neural Network Simulator for Education and Research.☆10Aug 28, 2016Updated 9 years ago
- OONI translations☆13Mar 5, 2026Updated 2 weeks ago
- Networking library based on anyio☆10Sep 3, 2025Updated 6 months ago
- Decentralized web Gateway for Internet Archive☆21Jan 4, 2020Updated 6 years ago
- KSFL - Kreative Structured Format Library☆17Feb 21, 2023Updated 3 years ago
- thin wrapper around process.hrtime in node and for the performance API in the browser☆17Feb 22, 2025Updated last year
- Sample code and slides for an SQL Alchemy ORM tutorial☆13Jan 22, 2015Updated 11 years ago
- A simple UTC => TAI converter and hex-encoded TAI (as used by DJBDNS) => UTC datetime.datetime decoder.☆15Apr 12, 2019Updated 6 years ago