cfhamlet / os-urlpattern
Unsupervised URLs clustering, generate and match URL pattern.
☆48Updated 5 years ago
Related projects ⓘ
Alternatives and complementary repositories for os-urlpattern
- An efficient simhash implementation for python☆124Updated 5 years ago
- Use pyppeteer from a Scrapy spider☆60Updated 4 years ago
- Compare html similarity using structural and style metrics☆210Updated last year
- Package to facilitate URL clustering☆68Updated 8 years ago
- A complimentary proxy to help to use SPM with headless browsers☆110Updated last year
- Fast Python Bloom Filter using Mmap☆13Updated 12 years ago
- Fast Redis Bloom Filters in Python☆289Updated 5 years ago
- Kafka-based components for Scrapy☆79Updated 6 years ago
- Scrapy extension which writes crawled items to Kafka☆30Updated 6 years ago
- A generic crawler☆78Updated 6 years ago
- Scrapy schema validation pipeline and Item builder using JSON Schema☆44Updated 3 years ago
- Sentry component for Scrapy☆86Updated last year
- A project to attempt to automatically login to a website given a single seed☆11Updated 4 months ago
- ☆29Updated 3 years ago
- Python extension module for accelerating regular expressions using libesm☆132Updated last year
- SOCKS{4,4a,5} endpoints for twisted☆58Updated 4 years ago
- Collection of persistent (disk-based) and non-persistent (memory-based) queues for Python☆269Updated 3 weeks ago
- Pyppeteer integration for Scrapy☆60Updated 3 years ago
- Tool to flatten stream of JSON-like objects, configured via schema☆33Updated 5 years ago
- Python bindings for CLD2.☆17Updated 6 years ago
- A fork of http://pydispatcher.sourceforge.net/ with PyPy support☆16Updated 7 years ago
- A decorator to write coroutine-like spider callbacks.☆110Updated last year
- Scrapy + Puppeteer☆111Updated 3 years ago
- Python bloom filter using redis as a shared backend.☆19Updated 7 years ago
- Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations☆40Updated 5 months ago
- MongoDB Python logging handler, Centralized logging made simple using MongoDB.☆135Updated 5 years ago
- Scrapy spider middleware to clean up query parameters in request URLs☆25Updated 8 years ago
- Output scrapy statistics to graphite/carbon☆54Updated 11 years ago
- Scrapy middleware for the autologin☆37Updated 6 years ago