brendonboshell / supercrawler
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
☆380Updated 2 years ago
Alternatives and similar repositories for supercrawler:
Users that are interested in supercrawler are comparing it to the libraries listed below
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.☆426Updated 2 years ago
- Web crawler for Node.JS☆253Updated 6 years ago
- Web scraping library made by the Phantombuster team. Modern, simple & works on all websites. (Deprecated)☆502Updated 4 years ago
- Declarative DOM extraction expression evaluator. 👨⚕️☆696Updated 4 years ago
- Verify email address checking MX records, and SMTP connection.☆124Updated 3 years ago
- Node module that summarizes text using a naive summarization algorithm☆770Updated 5 months ago
- Javascript scraping module based on puppeteer for many different search engines...☆557Updated 2 years ago
- Puppeteer (Headless Chrome Node API)-based rendering solution.☆535Updated 2 years ago
- Automatically extract body content (and other cool stuff) from an html document☆2,156Updated last year
- Flexible event driven crawler for node.☆2,141Updated 4 years ago
- House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.☆120Updated last year
- Node.js email SMTP verification, powered by EmailChecker.com API☆286Updated 2 years ago
- Crawler for LinkedIn full profiles 2019☆215Updated 4 years ago
- Instagram automation driven by headless chrome.☆119Updated 2 years ago
- A Node.js module to search and scrape Google.☆454Updated 6 years ago
- Google Search SERP Scraper☆107Updated last year
- Highly scalable Node.js scraping framework for mobsters☆298Updated 2 years ago
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.☆69Updated 3 years ago
- Automatically extracts structured information from webpages☆108Updated 2 years ago
- Scrape/Crawl article from any site automatically. Make any web page readable, no matter Chinese or English.☆343Updated 6 years ago
- docker image with Google Puppeteer installed☆484Updated 3 years ago
- Small tool to wait that all xhr are finished in puppeteer☆277Updated last month
- Cloud crawler functions for scrapeulous☆45Updated 4 years ago
- Run Puppeteer code in the cloud☆735Updated last year
- Simple, lightweight and expressive web scraping with Node.js☆154Updated 3 years ago
- Instagram bot in Node.js.☆128Updated 2 years ago
- Starter Kit for running Headless-Chrome by Puppeteer on AWS Lambda.☆581Updated 5 years ago
- Chromium / Puppeteer site crawler☆48Updated 4 years ago
- Google search scraper with captcha solving support☆91Updated 5 years ago
- Message queues which uses MongoDB.☆207Updated last year