brendonboshell / supercrawler
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
☆379Updated 2 years ago
Alternatives and similar repositories for supercrawler:
Users that are interested in supercrawler are comparing it to the libraries listed below
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.☆423Updated 2 years ago
- House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.☆119Updated last year
- Flexible event driven crawler for node.☆2,142Updated 3 years ago
- Email automation driven by headless chrome.☆163Updated 4 years ago
- Web scraping library made by the Phantombuster team. Modern, simple & works on all websites. (Deprecated)☆501Updated 4 years ago
- Declarative DOM extraction expression evaluator. 👨⚕️☆696Updated 4 years ago
- Scrape/Crawl article from any site automatically. Make any web page readable, no matter Chinese or English.☆343Updated 6 years ago
- Chromium / Puppeteer site crawler☆48Updated 4 years ago
- Google Search SERP Scraper☆107Updated last year
- Apify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSO…☆150Updated last year
- A Node.js module to search and scrape Google.☆454Updated 6 years ago
- Cloud crawler functions for scrapeulous☆45Updated 3 years ago
- Web crawler for Node.JS☆253Updated 6 years ago
- Automatically extract body content (and other cool stuff) from an html document☆2,154Updated last year
- Small tool to wait that all xhr are finished in puppeteer☆277Updated 2 weeks ago
- Google search scraper with captcha solving support☆91Updated 5 years ago
- simple multi-level scraper json input/output for Cheerio☆199Updated 2 years ago
- Advanced Node proxy checker (node proxy verifier, node proxy tester) with socks and https support☆109Updated 2 years ago
- Run Puppeteer code in the cloud☆734Updated 11 months ago
- Node module that summarizes text using a naive summarization algorithm☆770Updated 4 months ago
- Automatically extracts structured information from webpages☆107Updated 2 years ago
- Blazingly fast, multi tenant, faceted search API☆310Updated 4 years ago
- A complete and versatile web scraper.☆3,713Updated 4 years ago
- Get a random user agent (with an optional filter to select from a specific set of user agents).☆254Updated 2 years ago
- A node server and module which allows for cross-domain page scraping on web documents with JSONP or POST.☆746Updated 9 months ago
- light Tor proxy wrapper for request library☆312Updated 2 years ago
- Scrapoxy is a super proxies manager that orchestrates all your proxies into one place, rather than spreading management across multiple s…☆2,170Updated this week
- Puppeteer(Chrome headless node API) based web page renderer☆318Updated 4 months ago
- High-performance FlexSearch Server for Node.js (Cluster)☆188Updated 6 years ago
- Is headless chrome currently detectable? Let's pit the detections and detection evasions against eachother.☆649Updated 3 years ago