internetarchive / CDX-WriterLinks
Python script to create CDX index files of WARC data
☆20Updated this week
Alternatives and similar repositories for CDX-Writer
Users that are interested in CDX-Writer are comparing it to the libraries listed below
Sorting:
- Web archiving using Google Chrome☆47Updated 5 years ago
- Check out https://github.com/webrecorder/webrecorder for newer version matching https://webrecorder.io☆38Updated 9 years ago
- Making a reusable toolkit for writing seesaw scripts☆72Updated 2 years ago
- Wikipedia citation tool for Google Books, New York Times, ISBN, DOI and more☆22Updated 8 years ago
- URLTeam's second generation of URL shortener archiving tools☆80Updated last month
- ☆60Updated 3 years ago
- Archive.org OPDS Bookserver - A standard for digital book distribution☆130Updated 6 years ago
- INACTIVE - Service powering snippets on Firefox's about:home.☆31Updated 6 months ago
- craigslist blob service☆91Updated 8 years ago
- A universal Subscribe/Follow button.☆168Updated 2 years ago
- Repository for the legacy XTools. See https://github.com/x-tools/xtools for the rewrite☆42Updated 8 years ago
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.☆47Updated 7 years ago
- Grabbing all news.☆62Updated 5 years ago
- Removes duplicate files from specified folders☆47Updated 10 years ago
- A command line tool to archive a git repository from GitHub to the Internet Archive.☆91Updated 4 years ago
- Recover lost websites from the Web Infrastructure☆89Updated 3 weeks ago
- A validator for syndicated feeds. It works with Atom, RSS feeds as well as OPML and KML formats.☆119Updated 2 years ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆48Updated this week
- A CLI interface to healthchecks.io☆27Updated 2 years ago
- Multi-platform Docker container with utilities to process images (imagemagick, exiftool, optipng...).☆13Updated this week
- One-Click User Instigated Preservation☆128Updated 6 years ago
- A Twitter bot that archives tweets on demand.☆27Updated 7 years ago
- Celery-based task workers for collecting and updating data on WikiApiary.☆31Updated 9 years ago
- A simple REST API to identify requests made from TOR network.☆27Updated 3 years ago
- Specialised bot for periodical grabs and video/audio/etc. webpage scrapes.☆11Updated 7 years ago
- A browser for Python project documentation☆26Updated 2 years ago
- ☆36Updated last year
- Update a local archive of your tweets.☆49Updated 12 years ago
- External link tracking tool for Wikimedia partnerships☆11Updated last week
- Random name generator website☆24Updated last year