brendonboshell / supercrawlerLinks
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
☆381Updated 2 years ago
Alternatives and similar repositories for supercrawler
Users that are interested in supercrawler are comparing it to the libraries listed below
Sorting:
- Google Search SERP Scraper☆112Updated last year
- Scrape/Crawl article from any site automatically. Make any web page readable, no matter Chinese or English.☆344Updated 6 years ago
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.☆430Updated 2 years ago
- Email automation driven by headless chrome.☆167Updated 4 years ago
- Declarative DOM extraction expression evaluator. 👨⚕️☆694Updated 4 years ago
- Puppeteer (Headless Chrome Node API)-based rendering solution.☆540Updated 2 years ago
- House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.☆121Updated 2 years ago
- Web crawler for Node.JS☆253Updated 6 years ago
- Web scraping library made by the Phantombuster team. Modern, simple & works on all websites. (Deprecated)☆499Updated 4 years ago
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.☆70Updated 3 years ago
- Flexible event driven crawler for node.☆2,142Updated 4 years ago
- Run Puppeteer code in the cloud☆737Updated last year
- Apify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSO…☆150Updated 2 years ago
- Automatically extracts structured information from webpages☆109Updated 2 years ago
- A complete and versatile web scraper.☆3,717Updated 4 years ago
- simple multi-level scraper json input/output for Cheerio☆199Updated 2 years ago
- Automatically extract body content (and other cool stuff) from an html document☆2,157Updated 2 years ago
- Javascript scraping module based on puppeteer for many different search engines...☆559Updated 2 years ago
- Node.js email SMTP verification, powered by EmailChecker.com API☆289Updated 2 years ago
- Node.js module to create and trigger your own webHooks.☆192Updated last year
- CoCrawler is a versatile web crawler built using modern tools and concurrency.☆191Updated 3 years ago
- Cloud crawler functions for scrapeulous☆45Updated 4 years ago
- Crawler for LinkedIn full profiles 2019☆215Updated 4 years ago
- This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.☆1,208Updated last year
- All of the supporting materials for articles from Intoli's blog.☆274Updated 2 years ago
- plugin to extract keywords and key-phrases☆333Updated 7 months ago
- Node module that summarizes text using a naive summarization algorithm☆769Updated 7 months ago
- Nodejs lib to parse Google SERP html pages☆47Updated last year
- A curated list of awesome puppeteer resources.☆2,480Updated 10 months ago
- Amazon crawler - this configuration will extract items for a keywords that you will specify in the input, and it will automatically extra…☆76Updated 4 years ago