NikolaiT / struktur
Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.
β69Updated 3 years ago
Related projects β
Alternatives and complementary repositories for struktur
- 𧱠A uniform template to use as a foundation for Puppeteer bot construction.β62Updated 3 years ago
- Cloud crawler functions for scrapeulousβ44Updated 3 years ago
- Minimal set of tools to conduct stealthy scraping.β150Updated last year
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.β415Updated last year
- A test suite of common scraper detection techniques. See how detectable your scraper stack is.β135Updated 2 years ago
- DFPM is a browser extension for detecting browser fingerprinting.β115Updated last year
- Automatically extracts structured information from webpagesβ108Updated 2 years ago
- Apify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSOβ¦β150Updated last year
- Bypassing bot detection checks with Puppeteer.β94Updated 4 years ago
- NodeJs package for generating browser-like headers.β64Updated 2 years ago
- β107Updated 7 months ago
- Modern tests to detect automated browser behavior. Cover most important leaks from Puppeteer and Playwright.β27Updated 2 weeks ago
- Fingerprinting script of Fingerprint-Scannerβ232Updated 7 months ago
- π΅ββ Bot detection tests for Puppeteer. Hide and seek!β83Updated last year
- A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteeβ¦β89Updated last year
- Extract text from HTMLβ130Updated 4 years ago
- Add-ons for Playwright: adblocker, stealth modeβ46Updated 3 years ago
- Changes a EC2 machine's IP address every ten minutes using Elastic IPβ81Updated 3 years ago
- π‘π A conceptual patch which modifies some vanilla puppeteer files to decrease detection rates.β46Updated 3 years ago
- Index Common Crawl archives in tabular formatβ106Updated last week
- π Tooling to access Puppeteer's internal Isolated World.β18Updated 3 years ago
- Proxies Puppeteer Page requests.β201Updated 2 months ago
- Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supportβ¦β107Updated last year
- A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one pacβ¦β234Updated 10 months ago
- Advanced Node proxy checker (node proxy verifier, node proxy tester) with socks and https supportβ108Updated 2 years ago
- Helps to extract shortest optimal css-selector and multi-selector.β26Updated 7 years ago
- A suite of tools for protecting the web's open knowledge.β130Updated last month
- Home of fingerprint injector.β63Updated 2 years ago