A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.
☆382Dec 30, 2022Updated 3 years ago
Alternatives and similar repositories for supercrawler
Users that are interested in supercrawler are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Web Crawler/Spider for NodeJS + server-side jQuery ;-)☆6,785May 28, 2025Updated 11 months ago
- CoCrawler is a versatile web crawler built using modern tools and concurrency.☆194Apr 29, 2022Updated 4 years ago
- Distributed crawler powered by Headless Chrome☆5,637Apr 29, 2023Updated 3 years ago
- Crawl websites for accessibility issues from the command line.☆15Apr 17, 2020Updated 6 years ago
- NodeJS robots.txt parser with support for wildcard (*) matching.☆166May 12, 2026Updated last week
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- A collection of awesome web crawler,spider in different languages☆7,195Jun 16, 2024Updated last year
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.☆70Jun 8, 2021Updated 4 years ago
- A scalable frontier for web crawlers☆1,329Jun 6, 2025Updated 11 months ago
- Visually diff websites☆20Jan 22, 2018Updated 8 years ago
- Launch AWS Elastic MapReduce jobs that process Common Crawl data.☆49Feb 15, 2017Updated 9 years ago
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.☆439Dec 30, 2022Updated 3 years ago
- The next web scraper. See through the <html> noise.☆5,906May 6, 2026Updated 2 weeks ago
- Automatically extracts structured information from webpages☆111Jun 23, 2022Updated 3 years ago
- A job scraper using the Scrapy framework☆16Oct 20, 2017Updated 8 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A simple component to check the status of a domain (whois, availability, expired, PR, TrustFlow, ...)☆33Dec 5, 2016Updated 9 years ago
- Puppeteer Pool, run a cluster of instances in parallel☆3,516Mar 1, 2026Updated 2 months ago
- Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head☆174May 19, 2020Updated 6 years ago
- A lightweight robots.txt parser for Node.js with support for wildcards, caching and promises.☆15Jun 23, 2023Updated 2 years ago
- A simple collaborative textfield for nodejs☆18Apr 26, 2017Updated 9 years ago
- Javascript scraping module based on puppeteer for many different search engines...☆569Dec 30, 2022Updated 3 years ago
- Table Sorter☆21Feb 28, 2017Updated 9 years ago
- A plugin for puppeteer-extra to add proxy support☆18Dec 30, 2022Updated 3 years ago
- This is a mod of the Particletree PHP Quick Profiler that can be used with CodeIgniter.☆22Jan 11, 2012Updated 14 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Lightweight API for YouTube (Google API v3)☆16Dec 6, 2025Updated 5 months ago
- Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data …☆23,308Updated this week
- Schema.io + Node API starter kit☆15Nov 26, 2017Updated 8 years ago
- Scrapoxy has been discontinued.☆2,421Feb 7, 2026Updated 3 months ago
- Asynchronous Web Requests in Python.☆32Dec 5, 2020Updated 5 years ago
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆53Jun 12, 2020Updated 5 years ago
- Redux reducer and actions to get posts, users and tags from a Ghost Blog Public Api (https://ghost.org)☆10Sep 17, 2018Updated 7 years ago
- A simple and fully customizable web crawler/spider for Node.js with server-side DOM. Comes with elegant and hell-simple APIs.☆25Jul 27, 2021Updated 4 years ago
- This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.☆1,228Nov 7, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Ultimate Website Sitemap Parser☆251Jan 25, 2026Updated 3 months ago
- A complete and versatile web scraper.☆3,720Oct 18, 2020Updated 5 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆24Apr 8, 2026Updated last month
- ☆15Oct 4, 2024Updated last year
- Minimalist node.js command-line & programmatic API client for imaginary☆110Nov 18, 2018Updated 7 years ago
- DIG search and visualization user interface for the HT domain☆12Oct 2, 2017Updated 8 years ago
- CLI for scrape-it. A Node.js scraper for humans.☆17Feb 14, 2025Updated last year