brendonboshell / supercrawler
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
☆380Updated 2 years ago
Alternatives and similar repositories for supercrawler:
Users that are interested in supercrawler are comparing it to the libraries listed below
- Web crawler for Node.JS☆253Updated 6 years ago
- Google Search SERP Scraper☆109Updated last year
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.☆430Updated 2 years ago
- Declarative DOM extraction expression evaluator. 👨⚕️☆696Updated 4 years ago
- Javascript scraping module based on puppeteer for many different search engines...☆559Updated 2 years ago
- Flexible event driven crawler for node.☆2,141Updated 4 years ago
- Web scraping library made by the Phantombuster team. Modern, simple & works on all websites. (Deprecated)☆502Updated 4 years ago
- House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.☆121Updated 2 years ago
- Nodejs lib to parse Google SERP html pages☆47Updated last year
- Node.js implementation of a proxy server (think Squid) with support for SSL, authentication and upstream proxy chaining.☆909Updated last week
- Simple, lightweight and expressive web scraping with Node.js☆154Updated 3 years ago
- Email automation driven by headless chrome.☆166Updated 4 years ago
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.☆70Updated 3 years ago
- Starter Kit for running Headless-Chrome by Puppeteer on AWS Lambda.☆580Updated 5 years ago
- Proxies Puppeteer Page requests.☆208Updated 8 months ago
- Puppeteer (Headless Chrome Node API)-based rendering solution.☆538Updated 2 years ago
- Easily create XML sitemaps for your website.☆431Updated 10 months ago
- Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Support…☆112Updated 2 years ago
- Automatically extracts structured information from webpages☆108Updated 2 years ago
- Advanced Node proxy checker (node proxy verifier, node proxy tester) with socks and https support☆109Updated 2 years ago
- Cloud crawler functions for scrapeulous☆45Updated 4 years ago
- Verify email address checking MX records, and SMTP connection.☆124Updated 3 years ago
- Scrape/Crawl article from any site automatically. Make any web page readable, no matter Chinese or English.☆344Updated 6 years ago
- Additional module to use with 'puppeteer' for setting proxies per page basis.☆444Updated 10 months ago
- Puppeteer as a service☆455Updated 2 years ago
- A Node.js module to search and scrape Google.☆454Updated 6 years ago
- Automatically extract body content (and other cool stuff) from an html document☆2,157Updated last year
- Generate an object for testing if a request is sent, request is Mikeal's request.☆44Updated 4 years ago
- Robust RSS, Atom, and RDF feed parsing in Node.js☆1,971Updated last year
- A complete and versatile web scraper.☆3,718Updated 4 years ago