Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.
☆439Dec 30, 2022Updated 3 years ago
Alternatives and similar repositories for Crawling-Infrastructure
Users that are interested in Crawling-Infrastructure are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Cloud crawler functions for scrapeulous☆45Feb 24, 2021Updated 5 years ago
- Javascript scraping module based on puppeteer for many different search engines...☆569Dec 30, 2022Updated 3 years ago
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.☆70Jun 8, 2021Updated 4 years ago
- A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.☆2,812Jul 3, 2021Updated 4 years ago
- ☆12May 7, 2023Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- 🧱 A uniform template to use as a foundation for Puppeteer bot construction.☆68May 6, 2021Updated 5 years ago
- This repository contains instructions how to use the free IP Address API. The databases are: ASN database, Geolocation database, hosting …☆115May 13, 2026Updated last week
- List of free and checked http, https, socks4 and socks5 proxies☆21May 4, 2026Updated 2 weeks ago
- Scrapoxy has been discontinued.☆2,421Feb 7, 2026Updated 3 months ago
- 💯 Teach puppeteer new tricks through plugins.☆7,328Jul 18, 2024Updated last year
- 📡 Renew the IP address of a tethered Android device via Node asynchronously.☆75Aug 3, 2023Updated 2 years ago
- Search google, bing, yahoo, and other search engines with python☆670Apr 2, 2025Updated last year
- Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprint…☆5,017May 12, 2026Updated last week
- Solution to stop sites from fingerprinting your puppeteer☆129Apr 21, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆176Dec 30, 2022Updated 3 years ago
- The web scraper that's nearly impossible to block - now called @ulixee/hero☆735Mar 7, 2023Updated 3 years ago
- POC code to crash Windows Event Logger Service☆27Oct 16, 2020Updated 5 years ago
- The BlogDB Webservice☆13Feb 1, 2022Updated 4 years ago
- Repo for hosting various scripts for creating users for password spraying and other password attacks.☆11Jul 9, 2020Updated 5 years ago
- #️⃣ 🕸️ 👤 HTTP Headers Hashing☆13Aug 27, 2023Updated 2 years ago
- Distributed crawler powered by Headless Chrome☆5,637Apr 29, 2023Updated 3 years ago
- Microsoft Applocker evasion tool☆39Nov 26, 2019Updated 6 years ago
- Puppeteer Pool, run a cluster of instances in parallel☆3,516Mar 1, 2026Updated 2 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Event Data Collector☆40Mar 23, 2026Updated last month
- Docker kinsing malware bitcoin/xmr miner☆22Feb 18, 2021Updated 5 years ago
- A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and con…☆382Dec 30, 2022Updated 3 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆24Apr 8, 2026Updated last month
- 🛡🎭 A conceptual patch which modifies some vanilla puppeteer files to decrease detection rates.☆56Mar 6, 2021Updated 5 years ago
- ☆11Dec 18, 2018Updated 7 years ago
- Nodejs lib to parse Google SERP html pages☆46Jul 27, 2023Updated 2 years ago
- BH Cypher Queries picked up from random places☆41Dec 12, 2018Updated 7 years ago
- Web application/technology detection tool☆214Sep 1, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A JavaScript library for generating random user agents with data that's updated daily.☆1,160Updated this week
- Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data …☆23,308Updated this week
- Swift code to parse the quarantine history database, Chrome history database, Safari history database, and Firefox history database on ma…☆16Dec 3, 2020Updated 5 years ago
- Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of the…☆38Apr 23, 2026Updated 3 weeks ago
- A test suite of common scraper detection techniques. See how detectable your scraper stack is.☆139Oct 31, 2022Updated 3 years ago
- Parse search engine HTML to retrieve ads and other stuff, support Google and Bing☆11Jun 16, 2024Updated last year
- Streaming web crawler with WebSocket API☆46Apr 8, 2026Updated last month