NikolaiT/Crawling-Infrastructure

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NikolaiT/Crawling-Infrastructure)

NikolaiT / Crawling-Infrastructure

Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.

☆437

Alternatives and similar repositories for Crawling-Infrastructure

Users that are interested in Crawling-Infrastructure are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NikolaiT / scrapeulous
View on GitHub
Cloud crawler functions for scrapeulous
☆44Feb 24, 2021Updated 5 years ago
NikolaiT / se-scraper
View on GitHub
Javascript scraping module based on puppeteer for many different search engines...
☆570Dec 30, 2022Updated 3 years ago
NikolaiT / stealthy-scraping-tools
View on GitHub
Minimal set of tools to conduct stealthy scraping.
☆166Apr 21, 2023Updated 3 years ago
NikolaiT / struktur
View on GitHub
Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.
☆70Jun 8, 2021Updated 5 years ago
NikolaiT / GoogleScraper
View on GitHub
A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.
☆2,869Jul 3, 2021Updated 5 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
NikolaiT / IP-Address-API
View on GitHub
This repository contains instructions how to use the free IP Address API. The databases are: ASN database, Geolocation database, hosting …
☆117Updated this week
scrapoxy / scrapoxy
View on GitHub
Scrapoxy has been discontinued.
☆2,414Feb 7, 2026Updated 5 months ago
prescience-data / 4g-rotator
View on GitHub
📡 Renew the IP address of a tethered Android device via Node asynchronously.
☆76Aug 3, 2023Updated 2 years ago
berstend / puppeteer-extra
View on GitHub
💯 Teach puppeteer new tricks through plugins.
☆7,381Jul 18, 2024Updated 2 years ago
niespodd / browser-fingerprinting
View on GitHub
Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprint…
☆5,109May 12, 2026Updated 2 months ago
pavlealeksic / puppeteer-afp
View on GitHub
Solution to stop sites from fingerprinting your puppeteer
☆129Jun 12, 2026Updated last month
jsoverson / hackium
View on GitHub
☆174Dec 30, 2022Updated 3 years ago
ulixee / secret-agent
View on GitHub
The web scraper that's nearly impossible to block - now called @ulixee/hero
☆737Mar 7, 2023Updated 3 years ago
VirtueSecurity / benigncertain-monitor
View on GitHub
☆20Apr 21, 2020Updated 6 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
limbenjamin / LogServiceCrash
View on GitHub
POC code to crash Windows Event Logger Service
☆27Oct 16, 2020Updated 5 years ago
dafthack / lab_scripts
View on GitHub
Repo for hosting various scripts for creating users for password spraying and other password attacks.
☆11Jul 9, 2020Updated 6 years ago
symkat / BlogDB
View on GitHub
The BlogDB Webservice
☆13Feb 1, 2022Updated 4 years ago
yujiosaka / headless-chrome-crawler
View on GitHub
Distributed crawler powered by Headless Chrome
☆5,642Apr 29, 2023Updated 3 years ago
freshness79 / unlock
View on GitHub
Microsoft Applocker evasion tool
☆39Nov 26, 2019Updated 6 years ago
Caprico1 / kinsing
View on GitHub
Docker kinsing malware bitcoin/xmr miner
☆21Feb 18, 2021Updated 5 years ago
threatexpress / edc
View on GitHub
Event Data Collector
☆40Mar 23, 2026Updated 4 months ago
thomasdondorf / puppeteer-cluster
View on GitHub
Puppeteer Pool, run a cluster of instances in parallel
☆3,517Mar 1, 2026Updated 4 months ago
brendonboshell / supercrawler
View on GitHub
A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and con…
☆381Dec 30, 2022Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
prescience-data / harden-puppeteer
View on GitHub
🛡🎭 A conceptual patch which modifies some vanilla puppeteer files to decrease detection rates.
☆56Mar 6, 2021Updated 5 years ago
fox-it / signed-phishing-email
View on GitHub
☆11Dec 18, 2018Updated 7 years ago
checkymander / sshiva
View on GitHub
C# application that allows you to quick run SSH commands against a host or list of hosts
☆42Sep 21, 2020Updated 5 years ago
JoshSchwarz / Bloodhound-Cypher
View on GitHub
BH Cypher Queries picked up from random places
☆41Dec 12, 2018Updated 7 years ago
NikolaiT / free-proxy-list
View on GitHub
List of free and checked http, https, socks4 and socks5 proxies
☆22Updated this week
antoinevastel / bots-zoo
View on GitHub
☆117Mar 16, 2024Updated 2 years ago
apify / crawlee
View on GitHub
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data …
☆24,966Updated this week
xorrior / Leviathan
View on GitHub
A simple, quick, and dirty websocket shell for PowerShell.
☆20Jun 5, 2017Updated 9 years ago
scrapinghub / article-extraction-benchmark
View on GitHub
Article extraction benchmark: dataset and evaluation scripts
☆376May 29, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
unblocked-web / double-agent
View on GitHub
A test suite of common scraper detection techniques. See how detectable your scraper stack is.
☆139Oct 31, 2022Updated 3 years ago
digitalhurricane-io / puppeteer-detection-100-percent
View on GitHub
How to detect puppeteer with 100% accuracy
☆108May 30, 2021Updated 5 years ago
cbuto / greynoise-visualizer
View on GitHub
Web application to visualize GreyNoise API data
☆21Dec 4, 2018Updated 7 years ago
istresearch / scrapy-cluster
View on GitHub
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
☆1,226Nov 7, 2023Updated 2 years ago
lloydamiller / politicosint
View on GitHub
OSINT Resources for Politics
☆14Aug 13, 2018Updated 7 years ago
reanalytics-databoutique / advanced-scrapy-proxies
View on GitHub
Scrapy rotation proxy package with advanced functions
☆94Jul 4, 2022Updated 4 years ago
Cuadrix / puppeteer-page-proxy
View on GitHub
Additional module to use with 'puppeteer' for setting proxies per page basis.
☆449Jun 9, 2024Updated 2 years ago