A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
☆382Dec 30, 2022Updated 3 years ago
Alternatives and similar repositories for supercrawler
Users that are interested in supercrawler are comparing it to the libraries listed below
Sorting:
- Flexible event driven crawler for node.☆2,136Mar 7, 2021Updated 5 years ago
- Web Crawler/Spider for NodeJS + server-side jQuery ;-)☆6,788May 28, 2025Updated 9 months ago
- CoCrawler is a versatile web crawler built using modern tools and concurrency.☆193Apr 29, 2022Updated 3 years ago
- Distributed crawler powered by Headless Chrome☆5,706Apr 29, 2023Updated 2 years ago
- A scalable, mature and versatile web crawler based on Apache Storm☆972Updated this week
- A collection of awesome web crawler,spider in different languages☆7,148Jun 16, 2024Updated last year
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.☆70Jun 8, 2021Updated 4 years ago
- A scalable frontier for web crawlers☆1,330Jun 6, 2025Updated 9 months ago
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.☆438Dec 30, 2022Updated 3 years ago
- The next web scraper. See through the <html> noise.☆5,906Feb 16, 2026Updated last month
- 🔮 A Node.js scraper for humans.☆4,072Oct 13, 2025Updated 5 months ago
- ☆10Dec 23, 2019Updated 6 years ago
- Generate an object for testing if a request is sent, request is Mikeal's request.☆44Oct 15, 2020Updated 5 years ago
- Web scraper for NodeJS☆4,117Dec 13, 2023Updated 2 years ago
- A job scraper using the Scrapy framework☆16Oct 20, 2017Updated 8 years ago
- Floodesh is a distributed web spider written with Nodejs.☆13Sep 4, 2020Updated 5 years ago
- Basic integration between GraphQL-RxJs & GraphQL-Transport-WS☆12Nov 3, 2017Updated 8 years ago
- Deprecated. Use https://github.com/no-shot/env instead!☆11May 31, 2021Updated 4 years ago
- A simple collaborative textfield for nodejs☆18Apr 26, 2017Updated 8 years ago
- Javascript scraping module based on puppeteer for many different search engines...☆568Dec 30, 2022Updated 3 years ago
- Table Sorter☆21Feb 28, 2017Updated 9 years ago
- A CDN/API service for Undraw, the MIT-licensed illustrations by Katerina Limpitsouni☆12Aug 3, 2019Updated 6 years ago
- A plugin for puppeteer-extra to add proxy support☆18Dec 30, 2022Updated 3 years ago
- Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data …☆22,366Updated this week
- Lightweight API for YouTube (Google API v3)☆16Dec 6, 2025Updated 3 months ago
- Broad crawler for domain discovery☆20Feb 10, 2026Updated last month
- gentle forced aligner☆11Apr 25, 2024Updated last year
- IRL version of Chrome Offline T-Rex game☆12Apr 16, 2025Updated 11 months ago
- Scrapoxy has been discontinued.☆2,425Feb 7, 2026Updated last month
- A Node.js module to search and scrape Google.☆456Oct 4, 2018Updated 7 years ago
- NER toolkit for HTML data☆259May 3, 2024Updated last year
- 📦 A set of small and performant JS and Twig components☆11Mar 10, 2026Updated last week
- CQRS example with Go, MySQL, NATS, ElasticSearch☆11Jun 1, 2018Updated 7 years ago
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Jun 12, 2020Updated 5 years ago
- This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.☆1,230Nov 7, 2023Updated 2 years ago
- Ultimate Website Sitemap Parser☆246Jan 25, 2026Updated last month
- A complete and versatile web scraper.☆3,720Oct 18, 2020Updated 5 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆24Feb 10, 2026Updated last month
- A page scraping DSL for extracting structured information from unstructured XHTML, built on Node.js and jQuery☆49Jan 9, 2015Updated 11 years ago