NikolaiT / struktur
Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.
โ70Updated 3 years ago
Alternatives and similar repositories for struktur:
Users that are interested in struktur are comparing it to the libraries listed below
- ๐งฑ A uniform template to use as a foundation for Puppeteer bot construction.โ66Updated 3 years ago
- A test suite of common scraper detection techniques. See how detectable your scraper stack is.โ137Updated 2 years ago
- ๐ก๐ญ A conceptual patch which modifies some vanilla puppeteer files to decrease detection rates.โ54Updated 4 years ago
- NodeJs package for generating browser-like headers.โ69Updated 2 years ago
- A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteeโฆโ94Updated 2 years ago
- Minimal set of tools to conduct stealthy scraping.โ156Updated 2 years ago
- DFPM is a browser extension for detecting browser fingerprinting.โ116Updated 2 years ago
- Proxies Puppeteer Page requests.โ208Updated 7 months ago
- ๐ Tooling to access Puppeteer's internal Isolated World.โ22Updated 4 years ago
- Modern tests to detect automated browser behavior. Cover most important leaks from Puppeteer and Playwright.โ70Updated 6 months ago
- A web page that compiles methods used by Akamai, Datadome, and other bot detection solutions and WAF (Web Application Firewall) to identiโฆโ43Updated 4 years ago
- Apify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSOโฆโ150Updated 2 years ago
- โ114Updated last year
- Bypassing bot detection checks with Puppeteer.โ93Updated 4 years ago
- How to detect puppeteer with 100% accuracyโ108Updated 3 years ago
- Cloud crawler functions for scrapeulousโ45Updated 4 years ago
- ๐ตโโ Bot detection tests for Puppeteer. Hide and seek!โ94Updated 2 years ago
- A simple puppeteer wrapper to enable useful plugins with easeโ56Updated last week
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.โ429Updated 2 years ago
- โ25Updated 3 years ago
- A complimentary proxy to help to use SPM with headless browsersโ108Updated last year
- The Web Scraping Club Free Repositoryโ139Updated 5 months ago
- ๐ฎ Vindicate non-organic web traffic via MITM proxyโ54Updated 9 months ago
- Patching CDP (Chrome DevTools Protocol) leaks on OS level. Easy to use with Playwright, Selenium, and other web automation tools.โ113Updated 8 months ago
- Scraping assistant tool. Editing and maintaining CSS/XPath selectors across webpages.โ102Updated 6 years ago
- Add-ons for Playwright: adblocker, stealth modeโ46Updated 4 years ago
- JavaScript code of many commercial bot detectors/fingerprinting services and string deobfuscators for them if applicable.โ128Updated 3 years ago
- Modification of actual ghost-cursor for puppeteer, with more functionality and rewrited to work well with playwight.โ64Updated last year
- Fingerprinting script of Fingerprint-Scannerโ245Updated last month
- Advanced Node proxy checker (node proxy verifier, node proxy tester) with socks and https supportโ109Updated 2 years ago