Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.
☆438Dec 30, 2022Updated 3 years ago
Alternatives and similar repositories for Crawling-Infrastructure
Users that are interested in Crawling-Infrastructure are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Cloud crawler functions for scrapeulous☆45Feb 24, 2021Updated 5 years ago
- Javascript scraping module based on puppeteer for many different search engines...☆570Dec 30, 2022Updated 3 years ago
- Minimal set of tools to conduct stealthy scraping.☆163Apr 21, 2023Updated 2 years ago
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.☆70Jun 8, 2021Updated 4 years ago
- A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.☆2,802Jul 3, 2021Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆12May 7, 2023Updated 2 years ago
- 🧱 A uniform template to use as a foundation for Puppeteer bot construction.☆69May 6, 2021Updated 4 years ago
- This repository contains instructions how to use the free IP Address API. The databases are: ASN database, Geolocation database, hosting …☆115Mar 2, 2026Updated last month
- List of free and checked http, https, socks4 and socks5 proxies☆19Mar 30, 2026Updated last week
- Scrapoxy has been discontinued.☆2,423Feb 7, 2026Updated 2 months ago
- In-Memory Key-Value Database with Persistent File Storage☆16Sep 24, 2022Updated 3 years ago
- A simple example of using Puppeteer to test your analytics setup☆14Aug 5, 2022Updated 3 years ago
- Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprint…☆4,976Jul 17, 2024Updated last year
- ☆177Dec 30, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- The web scraper that's nearly impossible to block - now called @ulixee/hero☆729Mar 7, 2023Updated 3 years ago
- ☆20Apr 21, 2020Updated 5 years ago
- POC code to crash Windows Event Logger Service☆27Oct 16, 2020Updated 5 years ago
- Assorted, MIT licensed, threat hunting rules from @bradleyjkemp☆14Mar 11, 2022Updated 4 years ago
- The BlogDB Webservice☆13Feb 1, 2022Updated 4 years ago
- Repo for hosting various scripts for creating users for password spraying and other password attacks.☆11Jul 9, 2020Updated 5 years ago
- #️⃣ 🕸️ 👤 HTTP Headers Hashing☆13Aug 27, 2023Updated 2 years ago
- Distributed crawler powered by Headless Chrome☆5,699Apr 29, 2023Updated 2 years ago
- Microsoft Applocker evasion tool☆39Nov 26, 2019Updated 6 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Puppeteer Pool, run a cluster of instances in parallel☆3,514Mar 1, 2026Updated last month
- A streamlined tool for decoding and simplifying JavaScript obfuscated by Datadome's Interstitial challenge, enhancing readability and mai…☆33Jan 12, 2024Updated 2 years ago
- A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and con…☆382Dec 30, 2022Updated 3 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆24Feb 10, 2026Updated 2 months ago
- 🛡🎭 A conceptual patch which modifies some vanilla puppeteer files to decrease detection rates.☆56Mar 6, 2021Updated 5 years ago
- ☆11Dec 18, 2018Updated 7 years ago
- Chromium Binary for AWS Lambda and Google Cloud Functions☆3,290Sep 3, 2024Updated last year
- Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data …☆22,661Apr 1, 2026Updated last week
- A simple, quick, and dirty websocket shell for PowerShell.☆20Jun 5, 2017Updated 8 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Swift code to parse the quarantine history database, Chrome history database, Safari history database, and Firefox history database on ma…☆16Dec 3, 2020Updated 5 years ago
- ☆116Mar 16, 2024Updated 2 years ago
- A test suite of common scraper detection techniques. See how detectable your scraper stack is.☆140Oct 31, 2022Updated 3 years ago
- Streaming web crawler with WebSocket API☆46Updated this week
- BloodHound Cypher Queries Ported to a Jupyter Notebook☆53Jun 20, 2020Updated 5 years ago
- interactive command line interfaces for Python☆13Jan 3, 2021Updated 5 years ago
- Scrapy rotation proxy package with advanced functions☆94Jul 4, 2022Updated 3 years ago