internetarchive / crawling-for-nomore404
☆26Updated last week
Alternatives and similar repositories for crawling-for-nomore404:
Users that are interested in crawling-for-nomore404 are comparing it to the libraries listed below
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆26Updated 8 months ago
- Archiving GitHub☆9Updated 4 months ago
- Dynamic ToS;DR CMS, used in our frontpage☆49Updated 4 months ago
- Official Python package for ArchiveBox, the self-hosted internet archiving solution.☆13Updated 6 months ago
- URLTeam's second generation of URL shortener archiving tools☆75Updated 2 months ago
- Scripts for Internet Archive☆13Updated 2 weeks ago
- A Memento TimeGate☆42Updated 4 years ago
- A framework for quick web archiving; canonical repository: https://gitea.arpa.li/JustAnotherArchivist/qwarc☆27Updated 3 years ago
- Chrome extension that uses Memento to indicate that a page a user is viewing on the live web has an archived copy and to give the user ac…☆52Updated last month
- A suite of tools to store and retrieve binary data in DNS records, and a browser that can surf pages served over DNS instead of HTTP☆17Updated 3 years ago
- 🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.☆53Updated 7 months ago
- Documentation: https://github.com/ambanum/CGUs#exploring-the-versions-history☆8Updated 3 years ago
- Tool to import files from the Internet Archive to Wikimedia Commons.☆16Updated 2 months ago
- A copyright violation detector running on Wikimedia Cloud Services☆41Updated 3 months ago
- JavaScript Wiki Browser - Semi-automatic editing tool with no download required.☆23Updated 7 months ago
- A cute little Bash library for blazing fast argument parsing☆10Updated 2 years ago
- Archiving URLs (outlinks) from a variety of sources.☆21Updated last month
- Web Discovery Project☆53Updated this week
- Web-based whois gateway written in Python for lighttpd☆26Updated 3 months ago
- React components to render differences between captures at the Wayback Machine☆33Updated this week
- ☆68Updated last week
- Wikipedia citation tool for Google Books, New York Times, ISBN, DOI and more☆22Updated 8 years ago
- Github mirror of "analytics/quarry/web" - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_acce…☆43Updated 2 years ago
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Arch…☆19Updated last year
- WDQ to SPARQL translator☆9Updated 7 years ago
- A library for HTTPS Everywhere which compiles to WASM☆16Updated 4 years ago
- ☆137Updated this week
- The repo for the PetScan tool☆50Updated 2 weeks ago
- Conifer setup and deployment via Ansible☆12Updated 4 years ago
- Archiving YouTube dislikes with YouTube's api.☆11Updated 3 years ago