internetarchive / CDX-WriterLinks
Python script to create CDX index files of WARC data
☆20Updated 2 weeks ago
Alternatives and similar repositories for CDX-Writer
Users that are interested in CDX-Writer are comparing it to the libraries listed below
Sorting:
- Python script to create CDX index files of WARC data☆16Updated 7 years ago
- Web archiving using Google Chrome☆47Updated 5 years ago
- Trough: Big data, small databases.☆41Updated last year
- craigslist blob service☆91Updated 8 years ago
- Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)☆164Updated last month
- Wikipedia citation tool for Google Books, New York Times, ISBN, DOI and more☆22Updated 8 years ago
- Check out https://github.com/webrecorder/webrecorder for newer version matching https://webrecorder.io☆38Updated 9 years ago
- Decentralized web archiving☆20Updated 7 years ago
- INACTIVE - Service powering snippets on Firefox's about:home.☆31Updated 7 months ago
- Convert HTTP Archive (HAR) -> Web Archive (WARC) format☆54Updated 6 years ago
- track changes to the news, where news is anything with an RSS feed☆179Updated 5 years ago
- A list of things related to software, literature, and other content for 🕣 Memento☆99Updated last year
- Centralised repository for WARC usage specifications.☆117Updated 10 months ago
- A Memento TimeGate☆44Updated 5 years ago
- Converts WARC files to static HTML☆48Updated last year
- A dockerized, queued high fidelity web archiver based on Squidwarc☆61Updated last year
- Allow anyone with a modern browser to stream a 1GB, 10GB, 100GB, or 1TB file over the Internet and into a happy home.☆32Updated 6 years ago
- Fast extraction of all external links from wikipedia☆12Updated 6 years ago
- Grabbing all news.☆62Updated 5 years ago
- URLTeam's second generation of URL shortener archiving tools☆80Updated 2 weeks ago
- ☆30Updated 11 years ago
- Nondestructive warc-in-tar to warc conversion☆27Updated 12 years ago
- Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archive…☆26Updated 2 years ago
- CDXJ Indexing of WARC/ARCs☆28Updated 9 months ago
- Browsertrix: Containerized High-Fidelity Browser-Based Automated Crawling + Behavior System☆87Updated 4 years ago
- Web archive index server based on RocksDB☆35Updated 2 weeks ago
- Sort-friendly URI Reordering Transform (SURT) python module☆43Updated last week
- A queue-controlled browser automation tool for improving web crawl quality☆62Updated last month
- A prototype server to swarm multiple DATs for Webrecorder☆14Updated 6 years ago
- Tools for helping you work with web platform archive downloads.☆18Updated 5 years ago