internetarchive / crawling-for-nomore404Links
☆26Updated this week
Alternatives and similar repositories for crawling-for-nomore404
Users that are interested in crawling-for-nomore404 are comparing it to the libraries listed below
Sorting:
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆27Updated 11 months ago
- Archiving GitHub☆9Updated 8 months ago
- React components to render differences between captures at the Wayback Machine☆35Updated 2 months ago
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆56Updated 11 months ago
- A library for HTTPS Everywhere which compiles to WASM☆16Updated 4 years ago
- Saving all questions and answers from Yahoo! Answers.☆51Updated 4 years ago
- A Memento TimeGate☆43Updated 5 years ago
- Web-based whois gateway written in Python for lighttpd☆26Updated 7 months ago
- Perpetual Access To The Scholarly Record☆120Updated 11 months ago
- A command line tool to archive a git repository from GitHub to the Internet Archive.☆94Updated 4 years ago
- A Memento Aggregator CLI and Server in Go☆65Updated 4 months ago
- Scripts for Internet Archive☆13Updated 3 months ago
- The file every project should [eventually] have in their repo.☆24Updated 6 years ago
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.☆13Updated 9 months ago
- Wikipedia 1.0 engine & selection tools☆41Updated last week
- Github mirror of "analytics/quarry/web" - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_acce…☆43Updated 2 years ago
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Arch…☆19Updated last year
- This collaborative resource aims at empowering all actors countering information manipulation to grow and improve.☆16Updated last year
- ☆140Updated last week
- View the history of public and world readable Matrix rooms☆80Updated last year
- My collection of scripts that can be used on MediaWiki sites such as Wikipedia.☆12Updated 7 months ago
- Fosstodon's blog, code of conduct, team information, and more.☆29Updated 2 weeks ago
- Metadata and per-statute PDFs for the U.S. Statutes at Large through volume 64 (1789-1951).☆16Updated 5 years ago
- search interface for scholarly works☆85Updated 11 months ago
- Web archive index server based on RocksDB☆34Updated this week
- A fun tool for quickly browsing unsourced snippets on Wikipedia.☆111Updated last week
- Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives☆15Updated 4 years ago
- ☆71Updated this week
- A prototype server to swarm multiple DATs for Webrecorder☆14Updated 6 years ago
- Command line tool for digging into WARC files☆43Updated 3 weeks ago