jjjake/internetarchive

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jjjake/internetarchive)

jjjake / internetarchive

A Python and Command-Line Interface to Archive.org

☆1,887

Alternatives and similar repositories for internetarchive

Users that are interested in internetarchive are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

vmbrasseur / IAS3API
View on GitHub
Documentation for the Internet Archive S3 API
☆79Jan 25, 2018Updated 8 years ago
bibanon / tubeup
View on GitHub
Use yt-dlp to download video/metadata and upload to the Internet Archive.
☆509May 8, 2026Updated 2 months ago
ArchiveTeam / wpull
View on GitHub
Wget-compatible web downloader and crawler.
☆613Apr 29, 2024Updated 2 years ago
internetarchive / wayback
View on GitHub
IA's public Wayback Machine (moved from SourceForge)
☆850Mar 1, 2024Updated 2 years ago
JustAnotherArchivist / archivebot-archives
View on GitHub
☆15Nov 5, 2018Updated 7 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ArchiveTeam / grab-site
View on GitHub
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
☆1,602May 23, 2025Updated last year
iipc / awesome-web-archiving
View on GitHub
An Awesome List for getting started with web archiving
☆2,607Apr 27, 2026Updated 2 months ago
iipc / openwayback
View on GitHub
The OpenWayback Development
☆522Jan 3, 2024Updated 2 years ago
webrecorder / pywb
View on GitHub
Core Python Web Archiving Toolkit for replay and recording of web archives
☆1,684Apr 10, 2026Updated 3 months ago
hartator / wayback-machine-downloader
View on GitHub
Download an entire website from the Wayback Machine.
☆5,910Feb 8, 2024Updated 2 years ago
internetarchive / warcprox
View on GitHub
WARC writing MITM HTTP/S proxy
☆456Jun 17, 2026Updated last month
internetarchive / brozzler
View on GitHub
brozzler - distributed browser-based web crawler
☆809Jul 7, 2026Updated 2 weeks ago
JohnMarkOckerbloom / onlinebooks
View on GitHub
Selected code and data for The Online Books Page and related applications
☆12Jul 1, 2026Updated 3 weeks ago
ArchiveTeam / ArchiveBot
View on GitHub
ArchiveBot, an IRC bot for archiving websites
☆419Apr 17, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
palewire / savepagenow
View on GitHub
A simple Python wrapper and command-line interface for archive.org’s "Save Page Now" capturing service
☆196Jun 17, 2026Updated last month
ArchiveBox / ArchiveBox
View on GitHub
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and mor…
☆28,002Updated this week
oduwsdl / archivenow
View on GitHub
A Tool To Push Web Resources Into Web Archives
☆434Jan 23, 2024Updated 2 years ago
jsvine / waybackpack
View on GitHub
Download the entire Wayback Machine archive for a given URL.
☆3,218Apr 21, 2025Updated last year
atomotic / archiviiify
View on GitHub
Download digitized books from Internet Archive and view with IIIF, locally and offline.
☆39Apr 19, 2024Updated 2 years ago
gdamdam / iagitup
View on GitHub
Archive GitHub, GitLab, Bitbucket & any git repo to the Internet Archive as portable bundles with rich metadata.
☆102Mar 14, 2026Updated 4 months ago
Rhizome-Conifer / conifer
View on GitHub
Collect and revisit web pages.
☆1,542Updated this week
alard / megawarc
View on GitHub
Nondestructive warc-in-tar to warc conversion
☆27Apr 21, 2013Updated 13 years ago
akamhy / waybackpy
View on GitHub
Wayback Machine API interface & a command-line tool
☆599Feb 26, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
ArchiveTeam / Ubuntu-Warrior
View on GitHub
Scripts to build and boot warrior virtual machine containing Docker
☆122Apr 6, 2025Updated last year
internetarchive / bookreader
View on GitHub
The Internet Archive BookReader
☆1,162Updated this week
matthazinski / youtube2internetarchive
View on GitHub
Fork of youtube2internetarchive.py
☆12Feb 28, 2015Updated 11 years ago
ropensci / internetarchive
View on GitHub
Search the Internet Archive, retrieve metadata, and download files
☆64Dec 2, 2024Updated last year
MiniGlome / Archive.org-Downloader
View on GitHub
Python3 script to download archive.org books in PDF format
☆1,329Jul 10, 2026Updated 2 weeks ago
ArchiveTeam / urls-sources
View on GitHub
Sources for urls-grab.
☆15Jun 20, 2026Updated last month
webrecorder / warcio
View on GitHub
Streaming WARC/ARC library for fast web archive IO
☆462Jun 10, 2026Updated last month
WASAPI-Community / data-transfer-apis
View on GitHub
WASAPI data transfer APIs
☆50Apr 23, 2022Updated 4 years ago
WikiTeam / wikiteam
View on GitHub
Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2026, WikiTeam has preserved more th…
☆857Jan 10, 2026Updated 6 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
ArchiveTeam / NewsGrabber
View on GitHub
Grabbing all news.
☆60Dec 23, 2019Updated 6 years ago
ArchiveTeam / urls-grab
View on GitHub
Archiving URLs (outlinks) from a variety of sources.
☆25Jun 26, 2026Updated last month
ArchiveLabs / api.archivelab.org
View on GitHub
Archive.org API Server
☆39Nov 1, 2023Updated 2 years ago
internetarchive / openlibrary
View on GitHub
One webpage for every book ever published!
☆6,573Updated this week
internetarchive / warc
View on GitHub
Python library for reading and writing warc files
☆249Mar 7, 2022Updated 4 years ago
ArchiveTeam / wget-lua
View on GitHub
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
☆137Mar 19, 2026Updated 4 months ago
internetarchive / heritrix3
View on GitHub
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
☆3,284Jul 15, 2026Updated last week