Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.
☆437Dec 30, 2022Updated 3 years ago
Alternatives and similar repositories for Crawling-Infrastructure
Users that are interested in Crawling-Infrastructure are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Cloud crawler functions for scrapeulous☆44Feb 24, 2021Updated 5 years ago
- Javascript scraping module based on puppeteer for many different search engines...☆568Dec 30, 2022Updated 3 years ago
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.☆70Jun 8, 2021Updated 5 years ago
- A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.☆2,824Jul 3, 2021Updated 4 years ago
- ☆12May 7, 2023Updated 3 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- This repository contains instructions how to use the free IP Address API. The databases are: ASN database, Geolocation database, hosting …☆116Updated this week
- List of free and checked http, https, socks4 and socks5 proxies☆22Updated this week
- Scrapoxy has been discontinued.☆2,414Feb 7, 2026Updated 4 months ago
- In-Memory Key-Value Database with Persistent File Storage☆16Sep 24, 2022Updated 3 years ago
- 💯 Teach puppeteer new tricks through plugins.☆7,365Jul 18, 2024Updated last year
- Passive TCP/IP Fingerprinting Tool. Run this on your server and find out what Operating Systems your clients are *really* using.☆433Mar 7, 2026Updated 3 months ago
- Search google, bing, yahoo, and other search engines with python☆669Apr 2, 2025Updated last year
- Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprint…☆5,045May 12, 2026Updated last month
- Solution to stop sites from fingerprinting your puppeteer☆129Jun 12, 2026Updated 2 weeks ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆176Dec 30, 2022Updated 3 years ago
- The web scraper that's nearly impossible to block - now called @ulixee/hero☆737Mar 7, 2023Updated 3 years ago
- ☆20Apr 21, 2020Updated 6 years ago
- Assorted, MIT licensed, threat hunting rules from @bradleyjkemp☆14Mar 11, 2022Updated 4 years ago
- #️⃣ 🕸️ 👤 HTTP Headers Hashing☆12Aug 27, 2023Updated 2 years ago
- Distributed crawler powered by Headless Chrome☆5,643Apr 29, 2023Updated 3 years ago
- Microsoft Applocker evasion tool☆39Nov 26, 2019Updated 6 years ago
- Puppeteer Pool, run a cluster of instances in parallel☆3,514Mar 1, 2026Updated 4 months ago
- Docker kinsing malware bitcoin/xmr miner☆21Feb 18, 2021Updated 5 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A streamlined tool for decoding and simplifying JavaScript obfuscated by Datadome's Interstitial challenge, enhancing readability and mai…☆35Jan 12, 2024Updated 2 years ago
- A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and con…☆381Dec 30, 2022Updated 3 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆24Apr 8, 2026Updated 2 months ago
- 🛡🎭 A conceptual patch which modifies some vanilla puppeteer files to decrease detection rates.☆56Mar 6, 2021Updated 5 years ago
- ☆11Dec 18, 2018Updated 7 years ago
- BH Cypher Queries picked up from random places☆41Dec 12, 2018Updated 7 years ago
- React custom hooks for uploading files to a s3 bucket with progress showing abilities☆21Feb 13, 2026Updated 4 months ago
- Chromium Binary for AWS Lambda and Google Cloud Functions☆3,288Sep 3, 2024Updated last year
- A JavaScript library for generating random user agents with data that's updated daily.☆1,178Updated this week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Article extraction benchmark: dataset and evaluation scripts☆375May 29, 2026Updated last month
- Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data …☆24,227Updated this week
- A simple, quick, and dirty websocket shell for PowerShell.☆20Jun 5, 2017Updated 9 years ago
- Swift code to parse the quarantine history database, Chrome history database, Safari history database, and Firefox history database on ma…☆16Dec 3, 2020Updated 5 years ago
- ☆116Mar 16, 2024Updated 2 years ago
- Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of the…☆38Apr 23, 2026Updated 2 months ago
- A test suite of common scraper detection techniques. See how detectable your scraper stack is.☆139Oct 31, 2022Updated 3 years ago