povilasb/scrapy-html-storage

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/povilasb/scrapy-html-storage)

povilasb / scrapy-html-storage

Scrapy downloader middleware that stores response HTMLs to disk.

☆18

Alternatives and similar repositories for scrapy-html-storage

Users that are interested in scrapy-html-storage are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

scrapinghub / andi
View on GitHub
Library for annotation-based dependency injection
☆24Updated this week
scrapy-plugins / scrapy-jsonschema
View on GitHub
Scrapy schema validation pipeline and Item builder using JSON Schema
☆45Mar 26, 2021Updated 5 years ago
ownport / scrapy-dblite
View on GitHub
Simple library for storing Scrapy Items in sqlite database
☆12Jan 28, 2016Updated 10 years ago
TeamHG-Memex / Formasaurus
View on GitHub
Formasaurus tells you the type of an HTML form and its fields using machine learning
☆121Apr 8, 2026Updated 3 months ago
TeamHG-Memex / tor-proxy
View on GitHub
a tor socks proxy docker image
☆12Apr 8, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
itamarst / txtulip
View on GitHub
Run Twisted on the Tulip/asyncio event loop
☆12Aug 23, 2016Updated 9 years ago
mozilla / python-spidermonkey
View on GitHub
Spidermonkey wrapper for Python
☆18Apr 5, 2019Updated 7 years ago
fsouza / hls-rip
View on GitHub
Tool for ripping m3u8 playlists/segments.
☆15Dec 22, 2021Updated 4 years ago
KayneWest / DeepSpeech
View on GitHub
project trying to replicate http://arxiv.org/pdf/1412.5567v2.pdf
☆12Mar 22, 2015Updated 11 years ago
peterbourgon / rb
View on GitHub
High-performance in-memory ring buffer
☆29Apr 8, 2026Updated 3 months ago
Gingerbreadfork / Cutlery
View on GitHub
Python Script for Copywriters to Gather Data from Competing Content and Find Keyword Overlap
☆15Apr 23, 2022Updated 4 years ago
iaincollins / structured-data-api
View on GitHub
A simple platform for managing structured data.
☆28Feb 28, 2022Updated 4 years ago
scrapinghub / web-poet
View on GitHub
Web scraping Page Objects core library
☆107Jul 10, 2026Updated last week
ejulio / spider-feeder
View on GitHub
A library to make it easier to load input URLs to start scrapy processes
☆14Feb 21, 2021Updated 5 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
rodxavier / open-pse-initiative
View on GitHub
A project that aims to store historical data from the Philippine Stock Exchange(PSE) and make it available to the public through a REST A…
☆11Dec 7, 2022Updated 3 years ago
kolypto / py-smsframework
View on GitHub
Bi-directional SMS gateway with pluggable providers
☆14May 24, 2019Updated 7 years ago
arnimarj / py-leveldb
View on GitHub
Thread-safe Python bindings for LevelDB
☆15Sep 1, 2017Updated 8 years ago
realslimshanky / Spider-Sense
View on GitHub
A browser extension to monitor your spiders deployed on Scrapy Cloud.
☆16Mar 8, 2025Updated last year
Zyncco / Chrome-Extension
View on GitHub
☆11Oct 5, 2017Updated 8 years ago
niharm / facebook_feed_scraper
View on GitHub
Scrapes a given Facebook user's feed for messages, tags, likes, and datetimes of submissions.
☆10Jul 3, 2013Updated 13 years ago
alexbbt / facebook-group-export
View on GitHub
DEPRECATED Export Members of a Facebook Group to a CSV
☆13Jun 30, 2020Updated 6 years ago
TeamHG-Memex / docker-tor-rotator
View on GitHub
A rotating socks proxy using Tor, Delegate and Haproxy
☆14Apr 8, 2026Updated 3 months ago
IncomeStreamSurfer / claudeautoblogger
View on GitHub
This very simple python script takes inputs from your business and outputs articles written bhy claude.
☆13Apr 3, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
scrapy / scurl
View on GitHub
Performance-focused replacement for Python urllib
☆21Apr 13, 2026Updated 3 months ago
gopherguides / gopher-ai
View on GitHub
Claude Code plugins for Go developers - by Gopher Guides
☆17Updated this week
plandes / mednlp
View on GitHub
Medical natural language parsing and utility library
☆14Dec 10, 2025Updated 7 months ago
maastrichtlawtech / extraction_libraries
View on GitHub
Python libraries for extracting from data sources like Rechtspraak, ECHR, Cellar
☆13Jul 2, 2025Updated last year
case-contract-testing / contract-case
View on GitHub
Next generation contract testing
☆12Updated this week
jjonescz / awe
View on GitHub
AI-based web extractor
☆12Feb 25, 2023Updated 3 years ago
scrapinghub / webstruct
View on GitHub
NER toolkit for HTML data
☆259May 3, 2024Updated 2 years ago
crubba / htmltab
View on GitHub
An R package for assembling data frames from HTML tables (fka htmltable)
☆26Oct 27, 2018Updated 7 years ago
chrispbarlow / arduino-tasks
View on GitHub
Arduino-compatible Time-Triggered Cooperative scheduler
☆16Aug 3, 2018Updated 7 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
haseemajaz / Google-Indexing-API-Publisher
View on GitHub
Python script designed to simplify the process of submitting URLs to Google's Indexing API for faster and more efficient website indexing…
☆12Sep 12, 2023Updated 2 years ago
nudge / schema
View on GitHub
A Python implementation of SCHEMA - An Algorithm for Automated Product Taxonomy Mapping in E-commerce.
☆16Feb 3, 2015Updated 11 years ago
scrapy / itemloaders
View on GitHub
Library to populate items using XPath and CSS with a convenient API
☆49Updated this week
sigmavirus24 / rush
View on GitHub
Modular, way of implementing rate-limiting in python with a few handy default implementations
☆65Mar 27, 2023Updated 3 years ago
odie5533 / WarcMiddleware
View on GitHub
WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.
☆48Mar 19, 2018Updated 8 years ago
schemaorg / sdopythonapp
View on GitHub
Original schema.org python-appengine codebase
☆19Apr 10, 2022Updated 4 years ago
dwhitena / schedule
View on GitHub
Schedule for talks, workshops, etc. w/ links to past talk slides and videos.
☆26Nov 9, 2017Updated 8 years ago