Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.
☆437Dec 30, 2022Updated 3 years ago
Alternatives and similar repositories for Crawling-Infrastructure
Users that are interested in Crawling-Infrastructure are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Cloud crawler functions for scrapeulous☆44Feb 24, 2021Updated 5 years ago
- Javascript scraping module based on puppeteer for many different search engines...☆568Dec 30, 2022Updated 3 years ago
- Minimal set of tools to conduct stealthy scraping.☆166Apr 21, 2023Updated 3 years ago
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.☆70Jun 8, 2021Updated 5 years ago
- A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.☆2,820Jul 3, 2021Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆12May 7, 2023Updated 3 years ago
- 🧱 A uniform template to use as a foundation for Puppeteer bot construction.☆68May 6, 2021Updated 5 years ago
- This repository contains instructions how to use the free IP Address API. The databases are: ASN database, Geolocation database, hosting …☆115Jun 2, 2026Updated last week
- Scrapoxy has been discontinued.☆2,420Feb 7, 2026Updated 4 months ago
- In-Memory Key-Value Database with Persistent File Storage☆16Sep 24, 2022Updated 3 years ago
- 💯 Teach puppeteer new tricks through plugins.☆7,350Jul 18, 2024Updated last year
- 📡 Renew the IP address of a tethered Android device via Node asynchronously.☆75Aug 3, 2023Updated 2 years ago
- A simple example of using Puppeteer to test your analytics setup☆14Aug 5, 2022Updated 3 years ago
- Passive TCP/IP Fingerprinting Tool. Run this on your server and find out what Operating Systems your clients are *really* using.☆430Mar 7, 2026Updated 3 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Search google, bing, yahoo, and other search engines with python☆670Apr 2, 2025Updated last year
- Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprint…☆5,033May 12, 2026Updated 3 weeks ago
- ☆176Dec 30, 2022Updated 3 years ago
- The web scraper that's nearly impossible to block - now called @ulixee/hero☆737Mar 7, 2023Updated 3 years ago
- ☆20Apr 21, 2020Updated 6 years ago
- POC code to crash Windows Event Logger Service☆27Oct 16, 2020Updated 5 years ago
- The BlogDB Webservice☆13Feb 1, 2022Updated 4 years ago
- Repo for hosting various scripts for creating users for password spraying and other password attacks.☆11Jul 9, 2020Updated 5 years ago
- #️⃣ 🕸️ 👤 HTTP Headers Hashing☆12Aug 27, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Puppeteer Pool, run a cluster of instances in parallel☆3,516Mar 1, 2026Updated 3 months ago
- Event Data Collector☆40Mar 23, 2026Updated 2 months ago
- Mixpost Installation with Docker Containers☆14Mar 15, 2023Updated 3 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆24Apr 8, 2026Updated 2 months ago
- C# application that allows you to quick run SSH commands against a host or list of hosts☆42Sep 21, 2020Updated 5 years ago
- Pytest plugin to write Playwright tests with ease. Provides fixtures to have a page instance for each individual test and helpful CLI opt…☆14Aug 3, 2020Updated 5 years ago
- Nodejs lib to parse Google SERP html pages☆47Jul 27, 2023Updated 2 years ago
- BH Cypher Queries picked up from random places☆41Dec 12, 2018Updated 7 years ago
- Chromium Binary for AWS Lambda and Google Cloud Functions☆3,288Sep 3, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- A JavaScript library for generating random user agents with data that's updated daily.☆1,170Updated this week
- Article extraction benchmark: dataset and evaluation scripts☆373May 29, 2026Updated last week
- Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data …☆23,718Updated this week
- A simple, quick, and dirty websocket shell for PowerShell.☆20Jun 5, 2017Updated 9 years ago
- Swift code to parse the quarantine history database, Chrome history database, Safari history database, and Firefox history database on ma…☆16Dec 3, 2020Updated 5 years ago
- ☆116Mar 16, 2024Updated 2 years ago
- A test suite of common scraper detection techniques. See how detectable your scraper stack is.☆139Oct 31, 2022Updated 3 years ago