brendonboshell / supercrawler
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
☆380Updated 2 years ago
Alternatives and similar repositories for supercrawler:
Users that are interested in supercrawler are comparing it to the libraries listed below
- Declarative DOM extraction expression evaluator. 👨⚕️☆696Updated 4 years ago
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.☆428Updated 2 years ago
- Google Search SERP Scraper☆108Updated last year
- Scrape/Crawl article from any site automatically. Make any web page readable, no matter Chinese or English.☆344Updated 6 years ago
- Flexible event driven crawler for node.☆2,141Updated 4 years ago
- Verify email address checking MX records, and SMTP connection.☆124Updated 3 years ago
- Puppeteer (Headless Chrome Node API)-based rendering solution.☆536Updated 2 years ago
- Automatically extracts structured information from webpages☆108Updated 2 years ago
- Javascript scraping module based on puppeteer for many different search engines...☆558Updated 2 years ago
- Node module that summarizes text using a naive summarization algorithm☆770Updated 6 months ago
- Web scraping library made by the Phantombuster team. Modern, simple & works on all websites. (Deprecated)☆502Updated 4 years ago
- Automatically extract body content (and other cool stuff) from an html document☆2,156Updated last year
- Puppeteer as a service☆453Updated 2 years ago
- Web crawler for Node.JS☆253Updated 6 years ago
- Is headless chrome currently detectable? Let's pit the detections and detection evasions against eachother.☆655Updated 3 years ago
- House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.☆121Updated 2 years ago
- Run Puppeteer code in the cloud☆737Updated last year
- A complete and versatile web scraper.☆3,716Updated 4 years ago
- Puppeteer(Chrome headless node API) based web page renderer☆320Updated 6 months ago
- Small tool to wait that all xhr are finished in puppeteer☆277Updated last week
- Google search scraper with captcha solving support☆91Updated 5 years ago
- Good Enough Recommendation (GER) Engine☆381Updated last year
- Example project demonstrating Headless Chrome + Puppeteer running in their own individual containers.☆71Updated 2 years ago
- Blazingly fast, multi tenant, faceted search API☆310Updated 4 years ago
- Simple, lightweight and expressive web scraping with Node.js☆154Updated 3 years ago
- plugin to extract keywords and key-phrases☆333Updated 5 months ago
- Highly scalable Node.js scraping framework for mobsters☆298Updated 2 years ago
- Node.js email SMTP verification, powered by EmailChecker.com API☆288Updated 2 years ago
- Nodejs lib to parse Google SERP html pages☆47Updated last year
- A Node.js module to search and scrape Google.☆454Updated 6 years ago