brendonboshell / supercrawler
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
☆376Updated last year
Related projects ⓘ
Alternatives and complementary repositories for supercrawler
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.☆415Updated last year
- Javascript scraping module based on puppeteer for many different search engines...☆548Updated last year
- Web scraping library made by the Phantombuster team. Modern, simple & works on all websites. (Deprecated)☆501Updated 4 years ago
- Scrape/Crawl article from any site automatically. Make any web page readable, no matter Chinese or English.☆343Updated 6 years ago
- Declarative DOM extraction expression evaluator. 👨⚕️☆695Updated 4 years ago
- Web crawler for Node.JS☆253Updated 6 years ago
- Node.js implementation of a proxy server (think Squid) with support for SSL, authentication and upstream proxy chaining.☆849Updated 3 weeks ago
- Email automation driven by headless chrome.☆164Updated 3 years ago
- Library and CLI for automating captcha verification across multiple providers.☆122Updated 4 years ago
- Automatically extract body content (and other cool stuff) from an html document☆2,150Updated last year
- Google Search SERP Scraper☆104Updated last year
- Puppeteer (Headless Chrome Node API)-based rendering solution.☆527Updated 2 years ago
- House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.☆117Updated last year
- A collection of awesome web scaper, crawler.