NikolaiT / strukturLinks
Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.
☆70Updated 4 years ago
Alternatives and similar repositories for struktur
Users that are interested in struktur are comparing it to the libraries listed below
Sorting:
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.☆431Updated 2 years ago
- Minimal set of tools to conduct stealthy scraping.☆160Updated 2 years ago
- 🧱 A uniform template to use as a foundation for Puppeteer bot construction.☆68Updated 4 years ago
- A test suite of common scraper detection techniques. See how detectable your scraper stack is.☆140Updated 2 years ago
- A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppetee…☆97Updated 2 years ago
- NodeJs package for generating browser-like headers.☆72Updated 3 years ago
- 🔐 Tooling to access Puppeteer's internal Isolated World.☆22Updated 4 years ago
- Automatically extracts structured information from webpages☆109Updated 3 years ago
- 🛡🎭 A conceptual patch which modifies some vanilla puppeteer files to decrease detection rates.☆55Updated 4 years ago
- Is headless chrome currently detectable? Let's pit the detections and detection evasions against eachother.☆660Updated 4 years ago
- DFPM is a browser extension for detecting browser fingerprinting.☆124Updated 2 years ago
- Cloud crawler functions for scrapeulous☆45Updated 4 years ago
- ☆115Updated last year
- A simple puppeteer wrapper to enable useful plugins with ease☆57Updated last week
- Bypassing bot detection checks with Puppeteer.☆93Updated 5 years ago
- Proxies Puppeteer Page requests.☆212Updated last year
- Generates realistic browser fingerprints☆83Updated 3 years ago
- A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one pac…☆292Updated 3 months ago
- A complimentary proxy to help to use SPM with headless browsers☆108Updated 2 years ago
- Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Support…☆114Updated 2 years ago
- Add-ons for Playwright: adblocker, stealth mode☆46Updated 4 years ago
- A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and con…☆380Updated 2 years ago
- Advanced Node proxy checker (node proxy verifier, node proxy tester) with socks and https support☆109Updated 3 years ago
- 📡 Renew the IP address of a tethered Android device via Node asynchronously.☆76Updated 2 years ago
- The web scraper that's nearly impossible to block - now called @ulixee/hero☆722Updated 2 years ago
- Email automation driven by headless chrome.☆167Updated 4 years ago
- Fingerprinting script of Fingerprint-Scanner☆253Updated 5 months ago
- Javascript scraping module based on puppeteer for many different search engines...☆562Updated 2 years ago
- House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.☆124Updated 2 years ago
- Nodejs lib to parse Google SERP html pages☆46Updated 2 years ago