scrapy/protego

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/scrapy/protego)

scrapy / protego

A pure-Python robots.txt parser with support for modern conventions.

☆90

Alternatives and similar repositories for protego

Users that are interested in protego are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

scrapy / xtractmime
View on GitHub
https://mimesniff.spec.whatwg.org/ implementation for Python
☆13Jul 9, 2026Updated last week
scrapy / itemloaders
View on GitHub
Library to populate items using XPath and CSS with a convenient API
☆49Updated this week
scrapy / itemadapter
View on GitHub
Common interface for data container classes
☆70Jul 12, 2026Updated last week
scrapinghub / scrapy-poet
View on GitHub
Page Object pattern for Scrapy
☆127Jun 8, 2026Updated last month
scrapy / w3lib
View on GitHub
Python library of web-related functions
☆419Updated this week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
scrapy-plugins / scrapy-jsonschema
View on GitHub
Scrapy schema validation pipeline and Item builder using JSON Schema
☆45Mar 26, 2021Updated 5 years ago
zytedata / flattering
View on GitHub
Flatten, format, and export any JSON-like data to CSV (or any other string output).
☆17Sep 13, 2021Updated 4 years ago
zytedata / zyte-autoextract
View on GitHub
Python clients for Zyte AutoExtract API
☆41Jan 17, 2022Updated 4 years ago
realslimshanky / Spider-Sense
View on GitHub
A browser extension to monitor your spiders deployed on Scrapy Cloud.
☆16Mar 8, 2025Updated last year
further-reading / scrapy-gui
View on GitHub
A simple, Qt-Webengine powered web browser with built in functionality for basic scrapy webscraping support.
☆109May 21, 2024Updated 2 years ago
zytedata / python-zyte-api
View on GitHub
Python client for Zyte API
☆30Updated this week
scrapinghub / web-poet
View on GitHub
Web scraping Page Objects core library
☆107Jul 10, 2026Updated last week
scrapy / cssselect
View on GitHub
CSS Selectors for Python
☆309Updated this week
scrapy / scurl
View on GitHub
Performance-focused replacement for Python urllib
☆21Apr 13, 2026Updated 3 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
scrapinghub / andi
View on GitHub
Library for annotation-based dependency injection
☆24Updated this week
zytedata / zyte-spider-templates
View on GitHub
Spider templates for automatic crawlers.
☆35Mar 26, 2026Updated 3 months ago
scrapedia / scrapy-useragents
View on GitHub
A downloader middleware to change user-agent of scrapy
☆21Apr 13, 2026Updated 3 months ago
scrapinghub / spidermon
View on GitHub
Scrapy Extension for monitoring spiders execution.
☆561May 28, 2026Updated last month
ioxiocom / arangodantic
View on GitHub
☆12Apr 12, 2024Updated 2 years ago
epicserve / django-cache-url
View on GitHub
Use Cache URLs in your Django Application
☆20Jan 24, 2026Updated 5 months ago
scrapinghub / shub-workflow
View on GitHub
☆14Updated this week
scrapinghub / price-parser
View on GitHub
Extract price amount and currency symbol from a raw text string
☆346Mar 19, 2026Updated 4 months ago
scrapinghub / shub
View on GitHub
Scrapinghub Command Line Client
☆129Updated this week
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
zytedata / zyte-smartproxy-headless-proxy
View on GitHub
A complimentary proxy to help to use SPM with headless browsers
☆109May 20, 2026Updated 2 months ago
scrapy-plugins / scrapy-zyte-smartproxy
View on GitHub
Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy
☆363May 4, 2026Updated 2 months ago
TeamHG-Memex / html-text
View on GitHub
Extract text from HTML
☆135Apr 8, 2026Updated 3 months ago
scrapinghub / shublang
View on GitHub
Pluggable DSL that uses pipes to perform a series of linear transformations to extract data
☆16Jul 9, 2024Updated 2 years ago
scrapinghub / scrapy-autounit
View on GitHub
Automatic unit test generation for Scrapy.
☆58Jul 12, 2021Updated 5 years ago
scrapinghub / page_finder
View on GitHub
Find which links on a web page are pagination links
☆29Jan 12, 2017Updated 9 years ago
michael-shub / curl2scrapy
View on GitHub
Simple tool to convert curl requests to scrapy.
☆45Oct 21, 2021Updated 4 years ago
povilasb / scrapy-html-storage
View on GitHub
Scrapy downloader middleware that stores response HTMLs to disk.
☆18Apr 14, 2026Updated 3 months ago
fedora-infra / kitchen
View on GitHub
Useful snippets of python code
☆34Dec 30, 2020Updated 5 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
seomoz / reppy
View on GitHub
Modern robots.txt Parser for Python
☆195Jan 12, 2024Updated 2 years ago
kserhii / money-parser
View on GitHub
Price and currency parsing utility
☆27Mar 6, 2023Updated 3 years ago
scrapy-plugins / scrapy-streaming
View on GitHub
☆19Oct 12, 2016Updated 9 years ago
scrapy / scrapy-lint
View on GitHub
A linter for Scrapy projects.
☆22Jul 7, 2026Updated 2 weeks ago
TeamHG-Memex / extract-html-diff
View on GitHub
extract difference between two html pages
☆33Apr 8, 2026Updated 3 months ago
scrapinghub / extruct
View on GitHub
Extract embedded metadata from HTML markup
☆966Apr 1, 2026Updated 3 months ago
RohanGautam / rust-aws-lambda
View on GitHub
Make a rust executable that runs on AWS lambda
☆10Mar 2, 2021Updated 5 years ago