NikolaiT / struktur
Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.
β70Updated 3 years ago
Alternatives and similar repositories for struktur
Users that are interested in struktur are comparing it to the libraries listed below
Sorting:
- π§± A uniform template to use as a foundation for Puppeteer bot construction.β66Updated 4 years ago
- Apify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSOβ¦β150Updated 2 years ago
- A test suite of common scraper detection techniques. See how detectable your scraper stack is.β136Updated 2 years ago
- NodeJs package for generating browser-like headers.β71Updated 2 years ago
- Cloud crawler functions for scrapeulousβ45Updated 4 years ago
- DFPM is a browser extension for detecting browser fingerprinting.β118Updated 2 years ago
- Minimal set of tools to conduct stealthy scraping.β156Updated 2 years ago
- π‘π A conceptual patch which modifies some vanilla puppeteer files to decrease detection rates.β54Updated 4 years ago
- A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteeβ¦β94Updated 2 years ago
- Automatically extracts structured information from webpagesβ109Updated 2 years ago
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.β430Updated 2 years ago
- Bypassing bot detection checks with Puppeteer.β93Updated 4 years ago
- Add-ons for Playwright: adblocker, stealth modeβ46Updated 4 years ago
- Scrapy rotation proxy package with advanced functionsβ95Updated 2 years ago
- A web page that compiles methods used by Akamai, Datadome, and other bot detection solutions and WAF (Web Application Firewall) to identiβ¦β42Updated 4 years ago
- π Tooling to access Puppeteer's internal Isolated World.β22Updated 4 years ago
- β115Updated last year
- π΅ββ Bot detection tests for Puppeteer. Hide and seek!β93Updated 2 years ago
- Fingerprinting script of Fingerprint-Scannerβ246Updated 2 months ago
- A suite of tools for protecting the web's open knowledge.β127Updated 8 months ago
- A simple puppeteer wrapper to enable useful plugins with easeβ56Updated this week
- Parse And Create Web ARChive (WARC) files with node.jsβ98Updated 3 months ago
- Home of fingerprint injector.β69Updated 2 years ago
- Nodejs lib to parse Google SERP html pagesβ47Updated last year
- Generates realistic browser fingerprintsβ78Updated 2 years ago
- admin ui for scrapy/open source scrapinghubβ58Updated 4 years ago
- Proxies Puppeteer Page requests.β208Updated 8 months ago
- Scraping assistant tool. Editing and maintaining CSS/XPath selectors across webpages.β103Updated 6 years ago
- How to detect puppeteer with 100% accuracyβ109Updated 3 years ago
- CoCrawler is a versatile web crawler built using modern tools and concurrency.β190Updated 3 years ago