scrapinghub / autoextract-spiders
Pre-built Scrapy spiders for AutoExtract
☆19Updated 11 months ago
Alternatives and similar repositories for autoextract-spiders:
Users that are interested in autoextract-spiders are comparing it to the libraries listed below
- Zyte Automatic Extraction integration for Scrapy☆56Updated 3 years ago
- Python clients for Zyte AutoExtract API☆40Updated 3 years ago
- ☆29Updated 3 years ago
- A component that tries to avoid downloading duplicate content☆27Updated 6 years ago
- Extract text from HTML☆135Updated 4 years ago
- A complimentary proxy to help to use SPM with headless browsers☆108Updated last year
- Simple Web UI for Scrapy spider management via Scrapyd☆51Updated 6 years ago
- python package for performing deduplication using flexible text matching and cleaning in pandas dataframe☆25Updated 4 years ago
- Broad crawler for domain discovery☆19Updated 6 years ago
- A middleware layer for Scrapy that detects CAPTCHA tests and solves them☆45Updated last year
- A simple, Qt-Webengine powered web browser with built in functionality for basic scrapy webscraping support.☆109Updated 10 months ago
- ☆14Updated this week
- API - extract a list of keywords from a text.☆18Updated 7 years ago
- A Python package to get useful information from documents using TopicRank Algorithm.☆16Updated last year
- Extract dates from text☆64Updated 4 years ago
- Scrapy middleware for the autologin☆37Updated 6 years ago
- Scrapy extension which writes crawled items to Kafka☆30Updated 6 years ago
- Scalable String Similarity Joins in Python☆39Updated 9 months ago
- The most advanced debugging and testing tool for Scrapy☆16Updated last year
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆56Updated last year
- Virtual patent marking crawler at iproduct.epfl.ch☆14Updated 7 years ago
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.☆62Updated last week
- extract data from html table☆86Updated 4 years ago
- extract difference between two html pages☆32Updated 6 years ago
- ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of diff…☆88Updated 3 years ago
- A collection of pipelines for Scrapy☆16Updated 2 weeks ago
- https://mimesniff.spec.whatwg.org/ implementation for Python☆13Updated last year
- Scrapy pipeline which allows you to store scrapy items in appery.io database.☆14Updated 7 years ago
- Python based Wikidata framework for easy dataframe extraction☆43Updated last year