internetarchive / CDX-WriterLinks
Python script to create CDX index files of WARC data
☆20Updated 2 months ago
Alternatives and similar repositories for CDX-Writer
Users that are interested in CDX-Writer are comparing it to the libraries listed below
Sorting:
- Nondestructive warc-in-tar to warc conversion☆27Updated 12 years ago
- Convert HTTP Archive (HAR) -> Web Archive (WARC) format☆54Updated 7 years ago
- INACTIVE - Service powering snippets on Firefox's about:home.☆31Updated 9 months ago
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.☆47Updated 7 years ago
- Archiving Google+.☆26Updated 6 years ago
- craigslist blob service☆92Updated 8 years ago
- A Memento TimeGate☆44Updated 5 years ago
- Web archiving using Google Chrome☆46Updated 5 years ago
- Wikipedia citation tool for Google Books, New York Times, ISBN, DOI and more☆22Updated 9 years ago
- Decentralized web archiving☆20Updated 7 years ago
- ☆59Updated 3 years ago
- A Twitter bot that archives tweets on demand.☆27Updated 7 years ago
- An extension to help curate a dataset of pages that show in-page pop-ups☆12Updated 7 years ago
- Multi-platform Docker container with utilities to process images (imagemagick, exiftool, optipng...).☆13Updated this week
- DEPRECATED - Source behind the python 3 wall of superpowers (aka shame)☆26Updated 6 years ago
- A list of things related to software, literature, and other content for 🕣 Memento☆102Updated last year
- export data from twitter archive and visualize it☆25Updated 2 years ago
- Collusion for Chrome (and Safari!) is a browser extension that lets you visualize and, optionally, block the otherwise invisible websites…☆78Updated 12 years ago
- URLTeam's second generation of URL shortener archiving tools☆79Updated 2 months ago
- ☆31Updated 11 years ago
- 🔑 Drop-in OAuth client flows for Python on Google App Engine.☆39Updated this week
- Archive.org OPDS Bookserver - A standard for digital book distribution☆130Updated 7 years ago
- Specialised bot for periodical grabs and video/audio/etc. webpage scrapes.☆11Updated 7 years ago
- Grabbing all news.☆62Updated 5 years ago
- A fabric & fabtools flavored library whose purpose is deal with install, setup and deploy an application on a remote server.☆19Updated 3 years ago
- A JSON schema for open-source project contribution data.☆42Updated last year
- sync a website or local spreadsheet with a google sheet☆35Updated 2 years ago
- A command line tool to archive a git repository from GitHub to the Internet Archive.☆92Updated 4 years ago
- Datasette plugin that adds a .atom output format☆13Updated 3 weeks ago
- Service to deliver sponsored content while preserving privacy. Owned by the Ads team. Deployed in GCP.☆18Updated last year