internetarchive / crawling-for-nomore404Links
☆26Updated 3 weeks ago
Alternatives and similar repositories for crawling-for-nomore404
Users that are interested in crawling-for-nomore404 are comparing it to the libraries listed below
Sorting:
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆27Updated 10 months ago
- A library for HTTPS Everywhere which compiles to WASM☆16Updated 4 years ago
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆56Updated 10 months ago
- Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives☆15Updated 4 years ago
- A Memento TimeGate☆43Updated 5 years ago
- A Memento Aggregator CLI and Server in Go☆65Updated 3 months ago
- My collection of scripts that can be used on MediaWiki sites such as Wikipedia.☆12Updated 6 months ago
- Web archive index server based on RocksDB☆34Updated last month
- Chrome extension that uses Memento to indicate that a page a user is viewing on the live web has an archived copy and to give the user ac…☆54Updated this week
- CDXJ Indexing of WARC/ARCs☆26Updated 6 months ago
- ☆10Updated 3 years ago
- Comparing warc files☆17Updated 6 years ago
- Trough: Big data, small databases.☆42Updated 11 months ago
- Command line tool for digging into WARC files☆40Updated 3 weeks ago
- ☆18Updated 5 years ago
- A copyright violation detector running on Wikimedia Cloud Services☆42Updated 5 months ago
- React components to render differences between captures at the Wayback Machine☆34Updated 2 months ago
- ☆70Updated 2 weeks ago
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.☆13Updated 8 months ago
- Guerilla Open Access Manifesto, by Aaron Swartz, in several languages☆8Updated last year
- Nondestructive warc-in-tar to warc conversion☆26Updated 12 years ago
- Perpetual Access To The Scholarly Record☆120Updated 10 months ago
- FOSSmarks - A practical guide to understanding trademarks in the context of Free and Open Source Software projects.☆24Updated last year
- A suite of tools to store and retrieve binary data in DNS records, and a browser that can surf pages served over DNS instead of HTTP☆18Updated 3 years ago
- Archiving GitHub☆9Updated 7 months ago
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Arch…☆19Updated last year
- Proxies third-party PDF files and HTML pages with the Hypothesis client embedded, so you can annotate them☆23Updated last week
- Command line tool to convert a file in the WARC format to a file in the ZIM format☆58Updated 3 months ago
- Dynamic ToS;DR CMS, used in our frontpage☆49Updated 6 months ago
- Web-based whois gateway written in Python for lighttpd☆26Updated 6 months ago