scrapinghub / shub-workflowLinks
☆15Updated 2 weeks ago
Alternatives and similar repositories for shub-workflow
Users that are interested in shub-workflow are comparing it to the libraries listed below
Sorting:
- Pre-built Scrapy spiders for AutoExtract☆19Updated last year
- A complimentary proxy to help to use SPM with headless browsers☆108Updated 2 years ago
- Zyte Automatic Extraction integration for Scrapy☆56Updated 3 years ago
- Python clients for Zyte AutoExtract API☆40Updated 3 years ago
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆33Updated 2 years ago
- A toolkit for clustering web pages based on various similarity measures.☆33Updated 3 years ago
- Scrapy extension which writes crawled items to Kafka☆30Updated 6 years ago
- ☆50Updated 3 years ago
- Simple Web UI for Scrapy spider management via Scrapyd☆51Updated 7 years ago
- URL Inspection Tool Automator☆24Updated 2 years ago
- A component that tries to avoid downloading duplicate content☆27Updated 7 years ago
- code and data used to build a training dataset for dragnet models☆10Updated 4 years ago
- Scrapy middleware which allows to crawl only new content☆80Updated 2 years ago
- Scalable String Similarity Joins in Python☆39Updated last year
- Python client for Zyte API☆26Updated last month
- SEMRush SERP Tutorial. Using advertools to Extract and Analyze Search Engine Results Pages Data☆13Updated 6 years ago
- Given a new image, determine if it is likely derived from a known image.☆20Updated 7 years ago
- Integrate Watson Studio and Watson Campaign Automation to tailor your target audience for effective campaigns☆12Updated 3 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆23Updated 4 years ago
- PDF Table Extractor - repository to hold revisable version of code from https://www.cvast.tuwien.ac.at/projects/pdf2table by Burcu Yildiz☆38Updated last year
- Common interface for data container classes☆68Updated 3 weeks ago
- ☆16Updated 8 years ago
- Open Collaborative AI Driven Parser builder for Web Scraping, Data Extraction and Crawling,Knowledge Graph☆1Updated 5 months ago
- Neural Elastic Inference and Search☆19Updated 5 years ago
- Web scraping Page Objects core library☆102Updated 2 weeks ago
- A workflow system for Natural Language Processing.☆21Updated 5 years ago
- Play the card game Baccarat☆14Updated last year
- Matches a category of Google's Taxonomy to product that is described in any kind of text data☆62Updated 6 years ago
- ☆29Updated 4 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆57Updated last year