html-extract / hext
Domain-specific language for extracting structured data from HTML documents
☆52Updated this week
Alternatives and similar repositories for hext:
Users that are interested in hext are comparing it to the libraries listed below
- Inspect a URL and estimate if it contains a news story☆39Updated 4 months ago
- Add website scraping abilities to Datasette☆62Updated 2 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆44Updated 7 years ago
- Browser version of Hyphe (WIP)☆30Updated 5 months ago
- Twitter, quick. Fetch and store tweets on short notice.☆80Updated 8 years ago
- generate rules from lists of words☆16Updated 3 years ago
- Rig for deploying DocumentCloud viewers to S3.☆13Updated 3 years ago
- Generating text completions based on the Mueller report☆28Updated 5 years ago
- My personally curated list of bash/command-line commands and snippets that are very useful yet I keep on forgetting☆18Updated 2 years ago
- experiments in sorting☆26Updated 2 years ago
- Snowclone a Minute! You too can write an annoying twitter bot of your choosing.☆11Updated 7 years ago
- 🖼 A minimalistic take on responsive iframes in the spirit of Pym.js.☆26Updated 2 years ago
- The Docker meets Machine Learning Tutorial You've Been Wanting!☆38Updated last year
- etl pipeline, graphical explorer and general toolbox for investigations with follow the money data☆15Updated last year
- A LevelDB backed URL unshortening microservice written in JavaScript☆31Updated 2 years ago
- Because what if you could just... write graphics sketches? On the web? Like, directly?☆18Updated last month
- A lightweight JavaScript client library for the Wikimedia Pageviews API for Wikipedia and various of its sister projects for Node.js and …☆27Updated 4 years ago
- A chrome extension for remote performances on other people's computers☆49Updated 3 years ago
- Join data in the browser. Supports csv, tsv, psv, *json and dbf.☆11Updated 2 years ago
- Browsertrix: Containerized High-Fidelity Browser-Based Automated Crawling + Behavior System☆87Updated 4 years ago
- Tools for working with Optical Character Recognition output☆16Updated 11 years ago
- 📑 Read a Google Drive Doc and convert to JSON (via ArchieML)☆22Updated 6 years ago
- livecoding observable-ish experiment, just an experiment☆22Updated 4 years ago
- assorted text data☆34Updated 6 years ago
- a simple interface from extracting texts from (almost) any url☆52Updated 5 years ago
- Visualize the evolution of a file tracked by git☆25Updated 6 years ago
- Explore networks and publish narratives.☆53Updated 4 years ago
- A visualisation library for beneficial ownership structures☆21Updated last month
- A git scraper recording the CDC's Covid Data Tracker numbers on number of vaccinations per state.☆24Updated last year
- A simple app to add OAuth-based authentication in front of an S3 bucket-based static website.☆11Updated 2 years ago