jjjake / internetarchiveLinks
A Python and Command-Line Interface to Archive.org
☆1,819Updated last week
Alternatives and similar repositories for internetarchive
Users that are interested in internetarchive are comparing it to the libraries listed below
Sorting:
- The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns☆1,547Updated 7 months ago
- Wget-compatible web downloader and crawler.☆599Updated last year
- Core Python Web Archiving Toolkit for replay and recording of web archives☆1,600Updated last month
- Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2026, WikiTeam has preserved more th…☆809Updated last week
- ArchiveBot, an IRC bot for archiving websites☆406Updated 5 months ago
- Use yt-dlp to download video/metadata and upload to the Internet Archive.☆475Updated 2 months ago
- brozzler - distributed browser-based web crawler☆778Updated this week
- A Tool To Push Web Resources Into Web Archives☆427Updated last year
- IA's public Wayback Machine (moved from SourceForge)☆817Updated last year
- Webrecorder Player for Desktop (OSX/Windows/Linux). (Built with Electron + Webrecorder)☆445Updated 5 years ago
- The OpenWayback Development☆507Updated 2 years ago
- A simple Python wrapper and command-line interface for archive.org’s "Save Page Now" capturing service☆189Updated last year
- Web Archiving Integration Layer: One-Click User Instigated Preservation☆387Updated 10 months ago
- Run a high-fidelity browser-based web archiving crawler in a single Docker container☆950Updated this week
- Tool and library for handling Web ARChive (WARC) files.☆164Updated last year
- NOTE: This project is no longer being actively developed.. Check out https://replayweb.page / https://github.com/webrecorder/replayweb.pa…☆201Updated 11 months ago
- Download an entire website from the Wayback Machine.☆5,784Updated last year
- A simple Python wrapper for the archive.is capturing service☆210Updated 11 months ago
- Library of Alexandria (LoA in short) is a project that aims to collect and archive documents from the internet.☆129Updated last year
- WARC writing MITM HTTP/S proxy☆436Updated this week
- Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)☆169Updated 5 months ago
- Indexes open directories☆1,311Updated last month
- Dezoomify is a web application to download zoomable images from museum websites, image galleries, and map viewers. Many different zoomabl…☆767Updated 2 years ago
- archive reddit data as offline friendly web pages☆173Updated 5 years ago
- Recover lost websites from the Web Infrastructure☆91Updated 5 months ago
- The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords☆76Updated 7 months ago
- Making the public domain Loebs more easily downloadable. Data at https://github.com/ryanfb/loebolus-data☆101Updated this week
- Simple python script to download Bandcamp albums☆1,103Updated 4 months ago
- A tool providing additional ECC protection for optical media (unofficial version)☆400Updated 3 months ago
- Bash scripts which interact with Internet Archive Wayback Machine's Save Page Now☆138Updated 9 months ago