cfhamlet / os-urlpatternLinks
Unsupervised URLs clustering, generate and match URL pattern.
☆49Updated 6 years ago
Alternatives and similar repositories for os-urlpattern
Users that are interested in os-urlpattern are comparing it to the libraries listed below
Sorting:
- Fast Redis Bloom Filters in Python☆290Updated 6 years ago
- Use pyppeteer from a Scrapy spider☆59Updated 5 years ago
- Compare html similarity using structural and style metrics☆212Updated 2 years ago
- Package to facilitate URL clustering☆67Updated 9 years ago
- A complimentary proxy to help to use SPM with headless browsers☆108Updated 2 years ago
- Python bloom filter using redis as a shared backend.☆19Updated 7 years ago
- Sentry component for Scrapy☆86Updated last year
- Scriptable Google Chrome™ as a HTTP service + asyncio driver☆119Updated last year
- Simple Web UI for Scrapy spider management via Scrapyd☆51Updated 6 years ago
- This module is a Python Library that enables the user to find the country, region, city, coordinates, zip code, ISP, domain name, timezon…☆149Updated 2 weeks ago
- Pyppeteer integration for Scrapy☆58Updated 4 years ago
- Scrapy extension which writes crawled items to Kafka☆30Updated 6 years ago
- Kafka-based components for Scrapy☆79Updated 7 years ago
- Selenium Chrome and Firefox automated browser tips for blocking images, geotagging, etc...☆35Updated 6 years ago
- Collection of persistent (disk-based) and non-persistent (memory-based) queues for Python☆277Updated 2 months ago
- SOCKS{4,4a,5} endpoints for twisted☆59Updated 5 years ago
- Scrapy + Puppeteer☆110Updated 3 years ago
- Fast multi-keyword search engine for text strings☆255Updated 8 months ago
- A simple algorithm for clustering web pages, suitable for crawlers☆34Updated 8 years ago
- 更优雅的流式数据处理方式☆31Updated 7 years ago
- 🐍 A CPython extension for the Hyperscan regular expression matching library.☆178Updated last week
- A generic crawler☆78Updated 7 years ago
- A project to attempt to automatically login to a website given a single seed☆11Updated 11 months ago
- NER toolkit for HTML data☆259Updated last year
- Output scrapy statistics to graphite/carbon☆54Updated 12 years ago
- Scrapy schema validation pipeline and Item builder using JSON Schema☆44Updated 4 years ago
- Splash + HAProxy + Docker Compose☆197Updated 6 years ago
- Tool to flatten stream of JSON-like objects, configured via schema☆33Updated 5 years ago
- A collection of pipelines for Scrapy☆16Updated 2 months ago
- MongoDB Python logging handler, Centralized logging made simple using MongoDB.☆135Updated 6 years ago