Modern robots.txt Parser for Python
☆196Jan 12, 2024Updated 2 years ago
Alternatives and similar repositories for reppy
Users that are interested in reppy are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- URL Transformation, Sanitization☆103Jan 16, 2024Updated 2 years ago
- mltk - Moz Language Tool Kit☆12Mar 6, 2015Updated 11 years ago
- Python API for Various DB-Backed Simhash Clusters☆64Mar 16, 2017Updated 9 years ago
- Alternative robots parser module for Python☆22Apr 8, 2026Updated last month
- C++ bindings for url parsing and sanitization☆19May 2, 2024Updated 2 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Extract embedded metadata from HTML markup☆961Apr 1, 2026Updated last month
- Pipeline for distributed Natural Language Processing, made in Python☆65Jan 31, 2017Updated 9 years ago
- A pure-Python robots.txt parser with support for modern conventions.☆86Jan 29, 2026Updated 3 months ago
- Site Hound (previously THH) is a Domain Discovery Tool☆24Apr 8, 2026Updated last month
- Copy and paste text across LAN devices☆11Jul 3, 2017Updated 8 years ago
- Ultimate Website Sitemap Parser☆250Jan 25, 2026Updated 3 months ago
- Just the facts -- web page content extraction☆1,276Jul 8, 2025Updated 10 months ago
- JSON Logging for Sanic☆10Sep 1, 2021Updated 4 years ago
- Scrapy extension which writes crawled items to Kafka☆31Apr 8, 2026Updated last month
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Modularly extensible semantic metadata validator☆85Dec 10, 2015Updated 10 years ago
- Tagging and annotation framework for scan data☆100Oct 16, 2018Updated 7 years ago
- This project deals with hierarchical classification of web pages based on dmoz dataset.☆14Apr 10, 2014Updated 12 years ago
- A Corpus Data Retrieval Index using Lucene for Look-Ups☆20Updated this week
- Accurately separates a URL’s subdomain, domain, and public suffix, using the Public Suffix List (PSL).☆1,998Apr 21, 2026Updated 2 weeks ago
- Feed discovery to share :)☆41Oct 28, 2016Updated 9 years ago
- Fast multi-keyword search engine for text strings☆258Sep 14, 2024Updated last year
- Simhash and near-duplicate detection☆423May 15, 2023Updated 2 years ago
- Data science tools from Moz☆23Jan 11, 2017Updated 9 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Django feeds provides an extensive database model for RSS feeds and a fault tolerant parser.☆30Jun 14, 2012Updated 13 years ago
- Web crawler☆21Nov 18, 2017Updated 8 years ago
- Extract countries, regions and cities from a URL or text☆216Sep 10, 2020Updated 5 years ago
- URL normalization for Python☆100Apr 25, 2026Updated 2 weeks ago
- htcap is a web application scanner able to crawl single page application (SPA) in a recursive manner by intercepting ajax calls and DOM c…☆18Sep 23, 2025Updated 7 months ago
- Decred: On-chain atomic swaps for Viacoin, Litecoin and other cryptocurrencies.☆12Jan 30, 2023Updated 3 years ago
- Image processing and image analysis software. (Mirror of source)☆21Mar 19, 2011Updated 15 years ago
- Collects multimedia content shared through social networks.☆19Feb 18, 2015Updated 11 years ago
- Decentralized DNS fuzzer to mitigate ISP Snooping☆12May 3, 2017Updated 9 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [DEPRECATED] Unofficial Python Pandas DataReader objects with requests and requests_cache☆16Mar 5, 2018Updated 8 years ago
- Maltego integration of https://abusix.com☆16Sep 2, 2018Updated 7 years ago
- Parse domains using the TLD list maintained by publicsuffix.org☆62Jul 28, 2020Updated 5 years ago
- A scalable frontier for web crawlers☆1,330Jun 6, 2025Updated 11 months ago
- A bunch of scripts used for network defense during competitions.☆15Apr 3, 2015Updated 11 years ago
- A recommender system for GitHub repositories☆14Jun 21, 2014Updated 11 years ago
- pylinkvalidator is a standalone and pure python link validator and crawler that traverses a web site and reports errors (e.g., 500 and 40…☆147May 17, 2019Updated 6 years ago