scrapinghub / scrapy
Scrapy, a fast high-level screen scraping and web crawling framework for Python.
☆25Updated 8 years ago
Related projects ⓘ
Alternatives and complementary repositories for scrapy
- Extensions for using Scrapy on Amazon AWS☆32Updated 11 years ago
- Paginating the web☆37Updated 10 years ago
- A python library detect and extract listing data from HTML page.☆109Updated 7 years ago
- Sheetsu API documentation. https://docs.sheetsu.com☆10Updated 6 years ago
- ☆49Updated 2 years ago
- ☆32Updated 10 months ago
- Scrapes public information off of LinkedIn☆110Updated 8 years ago
- Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations☆40Updated 6 months ago
- Scrapy pipeline which allows you to store scrapy items in a solr server.☆19Updated 8 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆56Updated 9 months ago
- Matches a category of Google's Taxonomy to product that is described in any kind of text data☆59Updated 6 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 2 years ago
- An all-in-one toolbox made for SEOs☆200Updated 11 years ago
- Small set of utilities to simplify writing Scrapy spiders.☆49Updated 9 years ago
- Feed discovery to share :)☆40Updated 8 years ago
- Python email address and Mime parsing library☆8Updated 2 years ago
- Deployment Automation Engine☆27Updated 3 months ago
- ☆224Updated 9 years ago
- Restrict crawl and scraping scope using matchers.☆25Updated 8 years ago
- A library to parse Wayback Machine of archive.org to get a historical views of web pages. It is a useful tool to research on the evolutio…☆20Updated 5 years ago
- A simple algorithm for clustering web pages, suitable for crawlers☆34Updated 7 years ago
- Virtual patent marking crawler at iproduct.epfl.ch☆14Updated 7 years ago
- ScraperWiki Python library for scraping and saving data☆159Updated last year
- Python implementation of the Parsley language for extracting structured data from web pages☆92Updated 7 years ago
- A Simple tool to organize my roadmaps.☆19Updated last year
- The delegation details of top-level domains☆36Updated 2 weeks ago
- A Singer.io tap for extracting data from the Pipedrive API☆13Updated 2 months ago
- Traptor -- A distributed Twitter feed☆26Updated 2 years ago
- unified cli for various saas image classification apis.☆40Updated 7 years ago
- A distributed system for mining common crawl using SQS, AWS-EC2 and S3☆14Updated 10 years ago