sangaline/wayback-machine-scraper

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sangaline/wayback-machine-scraper)

sangaline / wayback-machine-scraper

A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

☆479

Alternatives and similar repositories for wayback-machine-scraper

Users that are interested in wayback-machine-scraper are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sangaline / scrapy-wayback-machine
View on GitHub
A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
☆122Feb 18, 2024Updated 2 years ago
hartator / wayback-machine-downloader
View on GitHub
Download an entire website from the Wayback Machine.
☆5,914Feb 8, 2024Updated 2 years ago
jsvine / waybackpack
View on GitHub
Download the entire Wayback Machine archive for a given URL.
☆3,219Apr 21, 2025Updated last year
akamhy / waybackpy
View on GitHub
Wayback Machine API interface & a command-line tool
☆600Feb 26, 2024Updated 2 years ago
sangaline / advanced-web-scraping-tutorial
View on GitHub
The Zipru scraper developed in the Advanced Web Scraping Tutorial.
☆426Mar 19, 2017Updated 9 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
internetarchive / wayback
View on GitHub
IA's public Wayback Machine (moved from SourceForge)
☆850Mar 1, 2024Updated 2 years ago
Anakeyn / Bert_Squad_SEO
View on GitHub
This tool provide a "Bert Score" for first max 30 pages responding to a question in Google
☆13Feb 10, 2020Updated 6 years ago
laopunk / KeyFinder
View on GitHub
Web based tool to identify musical scales and their related keys, rendered with React.js
☆12Mar 16, 2016Updated 10 years ago
matiskay / html-cluster
View on GitHub
A command line tool to cluster html pages based on structural and style similarity.
☆20Jan 13, 2026Updated 6 months ago
yniu87 / ML_Macro
View on GitHub
Modeling Macroeconomics with Deep Reinforcement Learning
☆15Aug 5, 2019Updated 6 years ago
CoryMcCartan / wacolors
View on GitHub
Colorblind-friendly Palettes from Washington State
☆16Apr 8, 2025Updated last year
sangaline / email-spy
View on GitHub
A browser extension that lets you find email addresses for any domain with a single click.
☆76May 17, 2017Updated 9 years ago
GoTrained / Scrapy-Craigslist
View on GitHub
Web Scraping Craigslist's Engineering Jobs in NY with Scrapy
☆66Aug 5, 2017Updated 8 years ago
TeamHG-Memex / page-compare
View on GitHub
Simple heuristic for measuring web page similarity (& data set)
☆91Apr 8, 2026Updated 3 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
TeamHG-Memex / extract-html-diff
View on GitHub
extract difference between two html pages
☆33Apr 8, 2026Updated 3 months ago
kuriwaki / github-demo
View on GitHub
Introduction to git for social science students (not software developers)
☆11Apr 15, 2019Updated 7 years ago
TeamHG-Memex / url-summary
View on GitHub
Show summary of a large number of URLs in a Jupyter Notebook
☆19Apr 8, 2026Updated 3 months ago
sangaline / reverse-engineering-the-hacker-news-ranking-algorithm
View on GitHub
An analysis of historical Hacker News data to determine the ranking algorithm
☆83Apr 4, 2017Updated 9 years ago
lee2sman / processing-to-love
View on GitHub
Notes and examples for getting started coding in LÖVE aka Love aka Love2d for folks with previous experience in Processing, p5.js and the…
☆17Dec 26, 2024Updated last year
Abbe98 / thor
View on GitHub
A platform-agnostic, configurable, and brandable SPARQL editor and visualization interface.
☆15Nov 6, 2025Updated 8 months ago
dannguyen / nicar-2019-pdfplumbing
View on GitHub
NICAR 2019 workshop on using Python and PDFplumber to extract text from PDFs
☆12Mar 9, 2019Updated 7 years ago
NikolaiT / GoogleScraper
View on GitHub
A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
☆2,868Jul 3, 2021Updated 5 years ago
yashwordlife / SportsDataAnalysis
View on GitHub
a Hadoop Map Reduce application that retrieves data/articles related to sports from sources like NY Times, Commoncrawl, and Twitter and c…
☆13Oct 3, 2019Updated 6 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
georgemandis / copy-open-tabs-urls
View on GitHub
Copy the URLs for all your own tabs to the clipboard
☆18Jun 6, 2025Updated last year
jaeyk / tidyethnicnews
View on GitHub
R package for turning Ethnic NewsWatch search results into tidyverse-ready dataframes
☆11Dec 7, 2021Updated 4 years ago
simmsb / traceroute-spoof
View on GitHub
Messing around with XDP and eBPF
☆20Oct 7, 2021Updated 4 years ago
MrDebugger / bs2json
View on GitHub
A python3 module that converts your bs4 Tag into json object (dict)
☆16Mar 17, 2026Updated 4 months ago
pythad / selenium_extensions
View on GitHub
Tools that will make writing tests, bots and scrapers using Selenium much easier
☆139Dec 7, 2024Updated last year
keyATM / keyATM
View on GitHub
An R package for Keyword Assisted Topic Models
☆120Jan 19, 2026Updated 6 months ago
webrecorder / pywb
View on GitHub
Core Python Web Archiving Toolkit for replay and recording of web archives
☆1,685Apr 10, 2026Updated 3 months ago
sudhamjayanthi / upwork-job-scraper
View on GitHub
Browsing jobs on upwork is time-consuming!!! How about checking them out right from your terminal? 🤩
☆37Oct 11, 2021Updated 4 years ago
FabienPetitEconomics / WorkingPaperTemplate
View on GitHub
WorkingPaperTemplate is a LaTeX template for working papers and presentations.
☆55Apr 12, 2024Updated 2 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
IBM / page-lab
View on GitHub
PageLab enables web performance, accessibility, SEO, etc testing at scale.
☆18Feb 16, 2022Updated 4 years ago
erikgahner / poliscijournals
View on GitHub
Overview of word limits in political science journals
☆40Jul 31, 2021Updated 4 years ago
rohithvutnoor / DataBite
View on GitHub
A Web Application of Vegetable Sales using Association Rules
☆11Nov 1, 2018Updated 7 years ago
iipc / awesome-web-archiving
View on GitHub
An Awesome List for getting started with web archiving
☆2,607Apr 27, 2026Updated 3 months ago
worldbank / wb-nlp-apps
View on GitHub
This repository contains the NLP modeling components and web application implementations of a project for knowledge and data discovery fu…
☆13Jun 29, 2021Updated 5 years ago
fedenanni / Computational-Text-Analysis-2018-19
View on GitHub
2018 Computational Text Analysis Notebooks, University of Mannheim
☆13Nov 22, 2018Updated 7 years ago
wo80 / bookmarks-viewer
View on GitHub
JSON Bookmarks Viewer
☆14Jul 2, 2026Updated 3 weeks ago