Module that extracts structured information from a rendered html site and outputs JSON. HTML to JSON.
☆70Jun 8, 2021Updated 4 years ago
Alternatives and similar repositories for struktur
Users that are interested in struktur are comparing it to the libraries listed below
Sorting:
- Distributed crawling infrastructure running on top of severless computation, cloud storage (such as S3) and sophisticated queues.☆438Dec 30, 2022Updated 3 years ago
- ☆116Mar 16, 2024Updated last year
- Web Page Inspection Tool UI. Article Summary, Sentiment Analysis, Keyword Extraction, Named Entity Recognition & Spell Check☆24Sep 29, 2025Updated 5 months ago
- 🧱 A uniform template to use as a foundation for Puppeteer bot construction.☆69May 6, 2021Updated 4 years ago
- A Node.js implementation of NUID☆14Dec 16, 2025Updated 2 months ago
- Cloud crawler functions for scrapeulous☆45Feb 24, 2021Updated 5 years ago
- The chrome browser controlled via puppeteer does not support switching proxies without restarting the browser. In this tutorial I show ho…☆12Dec 20, 2020Updated 5 years ago
- ☆13Jul 17, 2022Updated 3 years ago
- Camille's scraping boilerplate☆13Nov 1, 2022Updated 3 years ago
- Fast extraction of all external links from wikipedia☆13Sep 22, 2018Updated 7 years ago
- 📡 expose browser devtools port publicly with TLS and authentication.☆18Sep 10, 2024Updated last year
- chrome bot detection based off each release version. Each version has new updates or old ways to detect browser bots☆13Feb 18, 2025Updated last year
- Colorize all the photos in a directory☆15May 26, 2021Updated 4 years ago
- Javascript scraping module based on puppeteer for many different search engines...☆568Dec 30, 2022Updated 3 years ago
- Solution to stop sites from fingerprinting your puppeteer☆130Apr 21, 2024Updated last year
- Run Chrome from the Terminal☆18Aug 9, 2024Updated last year
- ☆17Dec 16, 2020Updated 5 years ago
- 🌟 Web Automation without coding and in just a few clicks. 🌟☆16Aug 10, 2020Updated 5 years ago
- Extract data from any website right in Chrome☆18Aug 24, 2018Updated 7 years ago
- Dockerized headless Chromium☆17Mar 28, 2023Updated 2 years ago
- Scraping workshop☆16Nov 21, 2016Updated 9 years ago
- Lightweight JavaScript library to interact with Chromium-based browsers via the Chrome DevTools Protocol☆27May 12, 2024Updated last year
- A single tab web browser built with puppeteer. Also, no client-side JS. Viewport is streamed with MJPEG. For realz.☆61Jul 23, 2023Updated 2 years ago
- CSV grooming, the JS way☆21Jul 8, 2019Updated 6 years ago
- 🛡🎭 A conceptual patch which modifies some vanilla puppeteer files to decrease detection rates.☆56Mar 6, 2021Updated 5 years ago
- The NESTS application was built using Tradeoff Analytics. It allows users to search for FHA-eligible homes in the US using real-time dat…☆22Jan 7, 2016Updated 10 years ago
- Undetected version of the main playwright implementation (NodeJS)☆25Jan 1, 2024Updated 2 years ago
- NodeJS library without any external dependencies to check if free HTTP/SOCKS4/SOCKS5 proxies are working/up☆27Apr 10, 2022Updated 3 years ago
- The most powerful helper class to make human-like actions with Puppeteer☆125Oct 25, 2024Updated last year
- Common utilities for wreq☆60Feb 11, 2026Updated 3 weeks ago
- A suite of tools for protecting the web's open knowledge.☆128Sep 16, 2024Updated last year
- JavaScript code of many commercial bot detectors/fingerprinting services and string deobfuscators for them if applicable.☆134Jun 30, 2021Updated 4 years ago
- 😈📚 A curated library of research papers and presentations for counter-detection and web privacy enthusiasts.☆733Feb 19, 2024Updated 2 years ago
- PDF to JSON, JSON to PDF and etc.☆12Apr 18, 2018Updated 7 years ago
- A python web scraper built on Selenium to gather profile data from okcupid.com☆11Oct 15, 2022Updated 3 years ago
- SEO Technical Standards Draft☆12Sep 26, 2024Updated last year
- HTTP proxy with per-request uTLS fingerprint mimicry and upstream proxy tunneling. Currently WIP.☆49Jan 14, 2024Updated 2 years ago
- 🏴 A straightforward forward-proxy written in Node.js.☆83Apr 27, 2024Updated last year
- ☆11Feb 20, 2025Updated last year