orangain/scrapy-s3pipeline

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/orangain/scrapy-s3pipeline)

orangain / scrapy-s3pipeline

Scrapy pipeline to store chunked items into Amazon S3 or Google Cloud Storage bucket.

☆76

Alternatives and similar repositories for scrapy-s3pipeline

Users that are interested in scrapy-s3pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

orangain / serverless-crawler
View on GitHub
Sample of server-less crawler using AWS Fargate and Lambda
☆12Dec 5, 2017Updated 8 years ago
ejulio / spider-feeder
View on GitHub
A library to make it easier to load input URLs to start scrapy processes
☆14Feb 21, 2021Updated 5 years ago
scrapy-plugins / scrapy-jsonschema
View on GitHub
Scrapy schema validation pipeline and Item builder using JSON Schema
☆45Mar 26, 2021Updated 5 years ago
DansProjects / airflow-averageface
View on GitHub
Creates a pipeline Airflow and Scrapy to output an average image composition of everyone's face in a given website
☆43Oct 13, 2017Updated 8 years ago
mgedmin / bootable-iso
View on GitHub
Bootable USB disk that lets you choose an ISO image
☆16Oct 19, 2020Updated 5 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
scrapy / itemloaders
View on GitHub
Library to populate items using XPath and CSS with a convenient API
☆49Updated this week
dbt-athena / athena-utils
View on GitHub
Utility functions for dbt projects running on Athena
☆12Mar 25, 2025Updated last year
cdrx / scrapyd-authenticated
View on GitHub
Docker container running scrapyd with HTTP authentication
☆41May 14, 2024Updated 2 years ago
scrapinghub / scrapy-frontera
View on GitHub
More flexible and featured Frontera scheduler for Scrapy
☆36Jun 6, 2025Updated last year
DataBrewery / learn-data-brewing
View on GitHub
Step-by-step introduction to the traditional data warehousing with examples.
☆11Mar 14, 2018Updated 8 years ago
fand / react-infinite-scroll-container
View on GitHub
A simple component for infinite scroll
☆20Apr 13, 2016Updated 10 years ago
scrapinghub / product-extraction-benchmark
View on GitHub
☆16Apr 10, 2026Updated 3 months ago
scrapy / scurl
View on GitHub
Performance-focused replacement for Python urllib
☆21Apr 13, 2026Updated 3 months ago
scrapinghub / andi
View on GitHub
Library for annotation-based dependency injection
☆24Jul 21, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
scrapinghub / arche
View on GitHub
Analyze scraped data
☆47Dec 9, 2019Updated 6 years ago
jxltom / scrapymon
View on GitHub
Simple Web UI for Scrapy spider management via Scrapyd
☆50Jun 25, 2018Updated 8 years ago
scrapinghub / spidermon
View on GitHub
Scrapy Extension for monitoring spiders execution.
☆562May 28, 2026Updated 2 months ago
mdbecker / pydata_2013
View on GitHub
PyData Boston 2013 talks: "Intro to scikit-learn" & "Realtime Predictive Analytics: Using scikit-learn and RabbitMQ"
☆11Jan 5, 2014Updated 12 years ago
lopuhin / scrapy-pyppeteer
View on GitHub
Use pyppeteer from a Scrapy spider
☆59Feb 5, 2020Updated 6 years ago
azu / delete-github-branches
View on GitHub
CLI: Delete GitHub Branches by pattern matching.
☆16Aug 23, 2022Updated 3 years ago
zytedata / html-text
View on GitHub
☆20Oct 6, 2025Updated 9 months ago
commoncrawl / gzipstream
View on GitHub
gzipstream allows Python to process multi-part gzip files from a streaming source
☆23Feb 24, 2017Updated 9 years ago
Tiago-Lira / scrapyd-mongodb
View on GitHub
Library designed to replace the SQLite backend by a MongoDB backend on Scrapy queue management
☆17Sep 2, 2017Updated 8 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
ljanyst / scrapy-do
View on GitHub
A daemon for scheduling Scrapy spiders
☆65May 28, 2021Updated 5 years ago
scrapinghub / shub
View on GitHub
Scrapinghub Command Line Client
☆129Jul 22, 2026Updated last week
scrapinghub / scrapy-poet
View on GitHub
Page Object pattern for Scrapy
☆127Jun 8, 2026Updated last month
dlt-hub / dlt-dagster-demo
View on GitHub
dlt-dagster-demo
☆14Nov 6, 2023Updated 2 years ago
pceuropa / youtube-crawler
View on GitHub
Youtube crawler & scraper based on scrapy. Written in Python3.
☆16Mar 13, 2026Updated 4 months ago
acordiner / scrapy-dynamodb
View on GitHub
AWS DynamoDB pipeline for Scrapy
☆21Mar 26, 2025Updated last year
arthurmoreno / setdict
View on GitHub
Python dict-like interface for merging dicts with add to set property
☆14Apr 13, 2026Updated 3 months ago
TeamHG-Memex / MaybeDont
View on GitHub
A component that tries to avoid downloading duplicate content
☆28Apr 8, 2026Updated 3 months ago
TeamHG-Memex / scrapy-rotating-proxies
View on GitHub
use multiple proxies with Scrapy
☆775Apr 8, 2026Updated 3 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
AccordBox / awesome-scrapy
View on GitHub
A curated list of awesome packages, articles, and other cool resources from the Scrapy community.
☆561Dec 28, 2022Updated 3 years ago
n-surkov / PySparkPipeline
View on GitHub
Module for pipelines concept in PySpark
☆16Mar 27, 2024Updated 2 years ago
alecxe / scrapy-fake-useragent
View on GitHub
Random User-Agent middleware based on fake-useragent
☆688Sep 18, 2023Updated 2 years ago
SecOps-Institute / memcached-server-iplist
View on GitHub
List of all Memcached Servers that are vulnerable to DDoS attack vector
☆10Dec 21, 2020Updated 5 years ago
sensuikan1973 / Flutter_RxDart_GetStarted
View on GitHub
I implement Flutter's "GetStarted" with using BLoC pattern (with RxDart)
☆14Nov 12, 2020Updated 5 years ago
musyoka-morris / pymongoext
View on GitHub
An extension for pymongo that adds json schema validation and index management
☆13Oct 19, 2019Updated 6 years ago
jlhood / serverless-app-ideas
View on GitHub
Ideas for serverless applications to be published to the AWS Serverless Application Repository
☆13Jun 10, 2018Updated 8 years ago