seomoz/reppy

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/seomoz/reppy)

seomoz / reppy

Modern robots.txt Parser for Python

☆195

Alternatives and similar repositories for reppy

Users that are interested in reppy are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

seomoz / url-py
View on GitHub
URL Transformation, Sanitization
☆104Jan 16, 2024Updated 2 years ago
edsu / microdata
View on GitHub
python library for extracting html microdata
☆168May 8, 2023Updated 3 years ago
seomoz / simhash-db-py
View on GitHub
Python API for Various DB-Backed Simhash Clusters
☆64Mar 16, 2017Updated 9 years ago
scrapinghub / extruct
View on GitHub
Extract embedded metadata from HTML markup
☆966Apr 1, 2026Updated 3 months ago
seomoz / uri_parser
View on GitHub
A fast URI parser that wraps Google's chromium URL canonicalization library
☆16Oct 25, 2023Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
scrapy / protego
View on GitHub
A pure-Python robots.txt parser with support for modern conventions.
☆90Updated this week
esprehn / fx-framework
View on GitHub
A framework for rapidly building polymer apps.
☆12Nov 19, 2015Updated 10 years ago
TeamHG-Memex / sitehound-frontend
View on GitHub
Site Hound (previously THH) is a Domain Discovery Tool
☆24Apr 8, 2026Updated 3 months ago
pferrel / solr-recommender
View on GitHub
☆16Sep 13, 2016Updated 9 years ago
PrinterFramework / CLI
View on GitHub
🖨️ Printer: Productivity Focused Next.js CLI Tool
☆11Nov 24, 2023Updated 2 years ago
linkedpipes / applications
View on GitHub
🖥 LinkedData based Applications generator
☆19Updated this week
GateNLP / ultimate-sitemap-parser
View on GitHub
Ultimate Website Sitemap Parser
☆255Jun 16, 2026Updated last month
msgflo / msgflo-python
View on GitHub
Python participant support for MsgFlo
☆14May 23, 2020Updated 6 years ago
zmap / ztag
View on GitHub
Tagging and annotation framework for scan data
☆100Oct 16, 2018Updated 7 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
TeamHG-Memex / scrapy-kafka-export
View on GitHub
Scrapy extension which writes crawled items to Kafka
☆31Apr 8, 2026Updated 3 months ago
dragnet-org / dragnet
View on GitHub
Just the facts -- web page content extraction
☆1,274Jul 8, 2025Updated last year
KorAP / Krill
View on GitHub
A Corpus Data Retrieval Index using Lucene for Look-Ups
☆20Jul 8, 2026Updated last week
seomoz / qless-py
View on GitHub
Python Bindings for qless
☆47Sep 23, 2019Updated 6 years ago
john-kurkowski / tldextract
View on GitHub
Accurately separates a URL’s subdomain, domain, and public suffix, using the Public Suffix List (PSL).
☆2,011Apr 21, 2026Updated 3 months ago
eridal / loda.sh
View on GitHub
Bash flauvored lodash port
☆11May 1, 2016Updated 10 years ago
iAcquire / gearnado
View on GitHub
Experimental Distributed Web Crawling with Python + Gearman
☆22May 2, 2012Updated 14 years ago
scoder / acora
View on GitHub
Fast multi-keyword search engine for text strings
☆258Sep 14, 2024Updated last year
Parsely / schemato
View on GitHub
Modularly extensible semantic metadata validator
☆85Dec 10, 2015Updated 10 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
vaibkumr / DatasetScraper
View on GitHub
Tool to create image datasets for machine learning problems by scraping search engines like Google, Bing and Baidu.
☆17Apr 20, 2019Updated 7 years ago
seomoz / simhash-py
View on GitHub
Simhash and near-duplicate detection
☆422May 15, 2023Updated 3 years ago
ushahidi / geograpy
View on GitHub
Extract countries, regions and cities from a URL or text
☆216Sep 10, 2020Updated 5 years ago
TeamHG-Memex / page-compare
View on GitHub
Simple heuristic for measuring web page similarity (& data set)
☆91Apr 8, 2026Updated 3 months ago
bhavishya235 / Web-Classification
View on GitHub
This project deals with hierarchical classification of web pages based on dmoz dataset.
☆14Apr 10, 2014Updated 12 years ago
tpoisot / nxfa2
View on GitHub
Force-Atlas 2 graph layout in networkx
☆22Sep 30, 2014Updated 11 years ago
delvelabs / htcap
View on GitHub
htcap is a web application scanner able to crawl single page application (SPA) in a recursive manner by intercepting ajax calls and DOM c…
☆18Sep 23, 2025Updated 9 months ago
generalov / django-resubmit
View on GitHub
Statefull widgets for django upload
☆15Oct 3, 2016Updated 9 years ago
femtotrader / pandas_datareaders_unofficial
View on GitHub
[DEPRECATED] Unofficial Python Pandas DataReader objects with requests and requests_cache
☆16Mar 5, 2018Updated 8 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
scrapinghub / frontera
View on GitHub
A scalable frontier for web crawlers
☆1,332Jun 6, 2025Updated last year
lenzenmi / asyncio_dispatch
View on GitHub
Event signalling for python and asyncio
☆20Nov 12, 2015Updated 10 years ago
zygmuntz / stardose
View on GitHub
A recommender system for GitHub repositories
☆14Jun 21, 2014Updated 12 years ago
mapio / py-web-graph
View on GitHub
A simple package allowing to use WebGraph data in Python (via the Jython interpreter).
☆20Oct 21, 2020Updated 5 years ago
datalib / libextract
View on GitHub
Extract data from websites using basic statistical magic
☆505Oct 2, 2020Updated 5 years ago
verifid / ner-d
View on GitHub
Python module for Named Entity Recognition (NER) using natural language processing.
☆13May 30, 2021Updated 5 years ago
idlesign / calibre-bookradar
View on GitHub
Calibre plugin. Searches for books metadata on bookradar.org
☆10Nov 10, 2020Updated 5 years ago