sangaline/scrapy-wayback-machine

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sangaline/scrapy-wayback-machine)

sangaline / scrapy-wayback-machine

A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

☆122

Alternatives and similar repositories for scrapy-wayback-machine

Users that are interested in scrapy-wayback-machine are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sangaline / wayback-machine-scraper
View on GitHub
A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
☆479Feb 23, 2024Updated 2 years ago
alecxe / scrapy-beautifulsoup
View on GitHub
Simple Scrapy middleware to process non-well-formed HTML with BeautifulSoup
☆22Sep 26, 2016Updated 9 years ago
ThomasAitken / Scrapy-Testmaster
View on GitHub
The most advanced debugging and testing tool for Scrapy
☆16Apr 19, 2023Updated 3 years ago
scrapinghub / flatson
View on GitHub
Tool to flatten stream of JSON-like objects, configured via schema
☆33Oct 19, 2019Updated 6 years ago
scrapinghub / scrapy-poet
View on GitHub
Page Object pattern for Scrapy
☆127Jun 8, 2026Updated last month
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
midwork-finds-jobs / duckdb-web-archive
View on GitHub
DuckDB extension to fetch pages from Wayback Machine & Common Crawl
☆21Jun 27, 2026Updated last month
7jdope8 / Bitcoin_Bruters_Toolkit
View on GitHub
full quite to turn you seed keys into a private key, bitcoin address, then search for matching wallets with positive balance
☆10Mar 31, 2024Updated 2 years ago
lorehov / mongolock
View on GitHub
Python distributed lock with mongodb backend
☆13Jun 11, 2023Updated 3 years ago
GitGuild / gitguild_whitepaper
View on GitHub
A repository for the Git Guild whitepaper, example contracts, and other high-level documents.
☆12Sep 13, 2016Updated 9 years ago
sangaline / advanced-web-scraping-tutorial
View on GitHub
The Zipru scraper developed in the Advanced Web Scraping Tutorial.
☆426Mar 19, 2017Updated 9 years ago
hereafter / power-taskbar
View on GitHub
Productive and cosmetic enhancements to windows taskbar.
☆10May 26, 2023Updated 3 years ago
MnO2 / zw-fast-quantile
View on GitHub
Zhang Wang Fast Approximate Quantiles Algorithm in Rust
☆14Jul 12, 2026Updated 2 weeks ago
gojiplus / statqa
View on GitHub
Extract Stats Q/A from Tables With Provenance
☆26Dec 27, 2025Updated 7 months ago
derickr / osm-year-in-edits
View on GitHub
☆25Jan 20, 2014Updated 12 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
scrapy-plugins / scrapy-magicfields
View on GitHub
Scrapy middleware to add extra fields to items, like timestamp, response fields, spider attributes etc.
☆56Mar 16, 2022Updated 4 years ago
codinglab2017 / Coding_Lab
View on GitHub
☆19Jun 15, 2017Updated 9 years ago
quantrocket-codeload / moonshot-intro
View on GitHub
Introductory tutorial for Moonshot demonstrating data collection, universe selection, and backtesting of an end-of-day momentum strategy.
☆10Apr 21, 2026Updated 3 months ago
AZHenley / SirDrCaptain
View on GitHub
A simple static website generator in Python
☆13Nov 22, 2019Updated 6 years ago
GitGuild / gitguild
View on GitHub
Governance of a git repository, using PGP identities.
☆16Jan 1, 2017Updated 9 years ago
TeamHG-Memex / Formasaurus
View on GitHub
Formasaurus tells you the type of an HTML form and its fields using machine learning
☆121Apr 8, 2026Updated 3 months ago
rsksmart / bips
View on GitHub
Bitcoin BIPs
☆16Jul 3, 2018Updated 8 years ago
patriciogonzalezvivo / RandomCity
View on GitHub
☆18Oct 19, 2016Updated 9 years ago
rmax / scrapy-inline-requests
View on GitHub
A decorator to write coroutine-like spider callbacks.
☆109Dec 26, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
nyov / scrapyext
View on GitHub
scrapy-extras -- a collection of code samples and modules for the Scrapy framework.
☆14Dec 14, 2020Updated 5 years ago
jairopaiva / BitcoinExprCracker
View on GitHub
Este projeto é fruto de um estudo pessoal sobre o algoritmo Secp256k1. O objetivo dele é conseguir obter, usando apenas os valores da cha…
☆11Dec 7, 2022Updated 3 years ago
coderefinery / jupyter
View on GitHub
Jupyter notebooks - A tool to write and share executable notebooks and data visualization
☆10Feb 5, 2026Updated 5 months ago
hurchalla / modular_arithmetic
View on GitHub
Clockwork: A Modular Arithmetic library for C++
☆14May 1, 2026Updated 2 months ago
svk31 / graphenejs-lib
View on GitHub
Pure JavaScript Bitshares/Graphene library for node.js and browsers
☆18Jan 24, 2017Updated 9 years ago
AccordBox / awesome-scrapy
View on GitHub
A curated list of awesome packages, articles, and other cool resources from the Scrapy community.
☆561Dec 28, 2022Updated 3 years ago
mustardamus / generator-grail
View on GitHub
Yeoman Generator for a modular One Page Application with Gulp, CoffeeScript, Stylus, Browserify, BrowserSync and Mocha. Vue.js, jQuery, S…
☆14Nov 6, 2015Updated 10 years ago
rsksmart / faucet
View on GitHub
RSK testnet faucet website
☆16Sep 13, 2019Updated 6 years ago
elacuesta / scrapy-pyppeteer
View on GitHub
Pyppeteer integration for Scrapy
☆58Feb 26, 2021Updated 5 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
husseinmoghnieh / graphql-polymorphism
View on GitHub
☆12Aug 13, 2018Updated 7 years ago
PacktPublishing / Elasticsearch-7-Quick-Start-Guide
View on GitHub
Elasticsearch 7 Quick Start Guide, published by Packt
☆11Jan 30, 2023Updated 3 years ago
gshipley / book-insultapp
View on GitHub
☆15Nov 7, 2018Updated 7 years ago
spring-attic / spring-cloud-stream-binder-ibm-mq
View on GitHub
☆10Jul 15, 2022Updated 4 years ago
n1tecki / Geography-of-Open-Source-Software
View on GitHub
☆35Sep 7, 2022Updated 3 years ago
cverluise / openPatstat
View on GitHub
Load, build and explore Patstat using the Google Cloud Platform
☆10Jan 19, 2019Updated 7 years ago
demining / Endomorphism-Secp256k1
View on GitHub
Speed up secp256k1 with endomorphism
☆14Dec 7, 2022Updated 3 years ago