brendonboshell/supercrawler

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/brendonboshell/supercrawler)

brendonboshell / supercrawler

A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.

☆381

Alternatives and similar repositories for supercrawler

Users that are interested in supercrawler are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

simplecrawler / simplecrawler
View on GitHub
Flexible event driven crawler for node.
☆2,134Mar 7, 2021Updated 5 years ago
yujiosaka / headless-chrome-crawler
View on GitHub
Distributed crawler powered by Headless Chrome
☆5,642Apr 29, 2023Updated 3 years ago
apache / stormcrawler
View on GitHub
A scalable, mature and versatile web crawler based on Apache Storm
☆986Updated this week
alixaxel / pagerank.js
View on GitHub
Vanilla JavaScript implementation of the Weighted PageRank Algorithm
☆34Jun 23, 2019Updated 7 years ago
BruceDone / awesome-crawler
View on GitHub
A collection of awesome web crawler,spider in different languages
☆7,257Jun 16, 2024Updated 2 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
NikolaiT / struktur
View on GitHub
Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.
☆70Jun 8, 2021Updated 5 years ago
scrapinghub / frontera
View on GitHub
A scalable frontier for web crawlers
☆1,332Jun 6, 2025Updated last year
silverwind / pagediff
View on GitHub
Visually diff websites
☆20Jan 22, 2018Updated 8 years ago
NikolaiT / Crawling-Infrastructure
View on GitHub
Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.
☆437Dec 30, 2022Updated 3 years ago
realhidden / wappalyzer-puppeteer
View on GitHub
☆10Dec 23, 2019Updated 6 years ago
matthewmueller / x-ray
View on GitHub
The next web scraper. See through the <html> noise.
☆5,903May 6, 2026Updated 2 months ago
IonicaBizau / scrape-it
View on GitHub
🔮 A Node.js scraper for humans.
☆4,074Jul 7, 2026Updated 2 weeks ago
anvaka / ngraph.pagerank
View on GitHub
PageRank calculation for ngraph.graph
☆30Apr 28, 2026Updated 2 months ago
aimee-gm / text-cleaner
View on GitHub
A simple method of cleaning strings
☆12Nov 2, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
wlabatey / job_scraper
View on GitHub
A job scraper using the Scrapy framework
☆16Oct 20, 2017Updated 8 years ago
ReedD / crawler
View on GitHub
Chromium / Puppeteer site crawler
☆48Mar 30, 2020Updated 6 years ago
alexpnt / MusicWallet
View on GitHub
Django app to manage musics, users and their favourite musics
☆11May 24, 2019Updated 7 years ago
N0taN3rd / Squidwarc
View on GitHub
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
☆178May 19, 2020Updated 6 years ago
thomasdondorf / puppeteer-cluster
View on GitHub
Puppeteer Pool, run a cluster of instances in parallel
☆3,517Mar 1, 2026Updated 4 months ago
mattcarlotta / snackables
View on GitHub
Deprecated. Use https://github.com/no-shot/env instead!
☆11May 31, 2021Updated 5 years ago
Bartozzz / crawlerr
View on GitHub
A simple and fully customizable web crawler/spider for Node.js with server-side DOM. Comes with elegant and hell-simple APIs.
☆25Jul 27, 2021Updated 4 years ago
NikolaiT / se-scraper
View on GitHub
Javascript scraping module based on puppeteer for many different search engines...
☆570Dec 30, 2022Updated 3 years ago
craftzdog / extract-main-text-node
View on GitHub
ExtractContent for node.js
☆15Apr 14, 2026Updated 3 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
apify / crawlee
View on GitHub
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data …
☆24,966Updated this week
fergiemcdowall / fergies-inverted-index
View on GitHub
Throw JavaScript objects at the index and they will become retrievable by their properties using promises and map-reduce
☆20Aug 8, 2025Updated 11 months ago
urish / real-trex-runner
View on GitHub
IRL version of Chrome Offline T-Rex game
☆12Apr 16, 2025Updated last year
Backlinko-LLC / search-engine-ranking
View on GitHub
📊 Repository for the study on 11.8 Million Google Search Results
☆27Mar 11, 2020Updated 6 years ago
jprichardson / node-google
View on GitHub
A Node.js module to search and scrape Google.
☆456Oct 4, 2018Updated 7 years ago
scrapoxy / scrapoxy
View on GitHub
Scrapoxy has been discontinued.
☆2,414Feb 7, 2026Updated 5 months ago
scrapinghub / webstruct
View on GitHub
NER toolkit for HTML data
☆259May 3, 2024Updated 2 years ago
studiometa / ui
View on GitHub
📦 A set of small and performant JS and Twig components
☆12Updated this week
istresearch / scrapy-cluster
View on GitHub
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
☆1,226Nov 7, 2023Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
dkpro / dkpro-c4corpus
View on GitHub
DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…
☆53Jun 12, 2020Updated 6 years ago
GateNLP / ultimate-sitemap-parser
View on GitHub
Ultimate Website Sitemap Parser
☆255Jun 16, 2026Updated last month
ThomasMiconi / BOHP_RNN
View on GitHub
Backprop training of recurrent neural networks with Hebbian plastic connections
☆20Jun 30, 2021Updated 5 years ago
TeamHG-Memex / sitehound-frontend
View on GitHub
Site Hound (previously THH) is a Domain Discovery Tool
☆24Apr 8, 2026Updated 3 months ago
adulau / DomainClassifier
View on GitHub
DomainClassifier is a Python (2/3) library to extract and classify Internet domains/hostnames/IP addresses from raw unstructured text fil…
☆81Jan 31, 2024Updated 2 years ago
lucy3 / whos_filtered
View on GitHub
☆15Oct 4, 2024Updated last year
RafigRzayev / Audio_Recording_to_WAV
View on GitHub
ESP32 - Record data with PDM microphone and save it as .wav file on SD card
☆16Mar 10, 2021Updated 5 years ago