Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.
☆438Dec 30, 2022Updated 3 years ago
Alternatives and similar repositories for Crawling-Infrastructure
Users that are interested in Crawling-Infrastructure are comparing it to the libraries listed below
Sorting:
- Cloud crawler functions for scrapeulous☆45Feb 24, 2021Updated 5 years ago
- Javascript scraping module based on puppeteer for many different search engines...☆568Dec 30, 2022Updated 3 years ago
- Minimal set of tools to conduct stealthy scraping.☆163Apr 21, 2023Updated 2 years ago
- Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.☆70Jun 8, 2021Updated 4 years ago
- A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, ...). Including asynchronous networking support.☆2,801Jul 3, 2021Updated 4 years ago
- ☆12May 7, 2023Updated 2 years ago
- 🧱 A uniform template to use as a foundation for Puppeteer bot construction.☆69May 6, 2021Updated 4 years ago
- Scrapoxy has been discontinued.☆2,425Feb 7, 2026Updated last month
- In-Memory Key-Value Database with Persistent File Storage☆16Sep 24, 2022Updated 3 years ago
- Passive TCP/IP Fingerprinting Tool. Run this on your server and find out what Operating Systems your clients are *really* using.☆410Mar 7, 2026Updated last week
- Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprint…☆4,968Jul 17, 2024Updated last year
- Solution to stop sites from fingerprinting your puppeteer☆130Apr 21, 2024Updated last year
- ☆177Dec 30, 2022Updated 3 years ago
- The web scraper that's nearly impossible to block - now called @ulixee/hero☆729Mar 7, 2023Updated 3 years ago
- Assorted, MIT licensed, threat hunting rules from @bradleyjkemp☆14Mar 11, 2022Updated 4 years ago
- The BlogDB Webservice☆12Feb 1, 2022Updated 4 years ago
- Repo for hosting various scripts for creating users for password spraying and other password attacks.☆11Jul 9, 2020Updated 5 years ago
- Distributed crawler powered by Headless Chrome☆5,706Apr 29, 2023Updated 2 years ago
- Puppeteer Pool, run a cluster of instances in parallel☆3,513Mar 1, 2026Updated 2 weeks ago
- Event Data Collector☆39Jan 12, 2026Updated 2 months ago
- Docker kinsing malware bitcoin/xmr miner☆23Feb 18, 2021Updated 5 years ago
- A streamlined tool for decoding and simplifying JavaScript obfuscated by Datadome's Interstitial challenge, enhancing readability and mai…☆32Jan 12, 2024Updated 2 years ago
- Mixpost Installation with Docker Containers☆14Mar 15, 2023Updated 3 years ago
- 🛡🎭 A conceptual patch which modifies some vanilla puppeteer files to decrease detection rates.☆56Mar 6, 2021Updated 5 years ago
- C# application that allows you to quick run SSH commands against a host or list of hosts☆42Sep 21, 2020Updated 5 years ago
- ☆11Dec 18, 2018Updated 7 years ago
- Pytest plugin to write Playwright tests with ease. Provides fixtures to have a page instance for each individual test and helpful CLI opt…☆14Aug 3, 2020Updated 5 years ago
- Chromium Binary for AWS Lambda and Google Cloud Functions☆3,291Sep 3, 2024Updated last year
- Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data …☆22,366Updated this week
- A JavaScript library for generating random user agents with data that's updated daily.☆1,144Updated this week
- Article extraction benchmark: dataset and evaluation scripts☆356Mar 1, 2026Updated 2 weeks ago
- A simple, quick, and dirty websocket shell for PowerShell.☆20Jun 5, 2017Updated 8 years ago
- Swift code to parse the quarantine history database, Chrome history database, Safari history database, and Firefox history database on ma…☆15Dec 3, 2020Updated 5 years ago
- ☆116Mar 16, 2024Updated 2 years ago
- A test suite of common scraper detection techniques. See how detectable your scraper stack is.☆141Oct 31, 2022Updated 3 years ago
- Streaming web crawler with WebSocket API☆45Updated this week
- This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.☆1,229Nov 7, 2023Updated 2 years ago
- Scrapy rotation proxy package with advanced functions☆94Jul 4, 2022Updated 3 years ago
- How to detect puppeteer with 100% accuracy☆108May 30, 2021Updated 4 years ago