html-extract / hext
Domain-specific language for extracting structured data from HTML documents
☆53Updated last month
Alternatives and similar repositories for hext:
Users that are interested in hext are comparing it to the libraries listed below
- Rig for deploying DocumentCloud viewers to S3.☆13Updated 3 years ago
- Browser version of Hyphe (WIP)☆30Updated 6 months ago
- Browsertrix: Containerized High-Fidelity Browser-Based Automated Crawling + Behavior System☆87Updated 4 years ago
- 🖼 A minimalistic take on responsive iframes in the spirit of Pym.js.☆26Updated 2 years ago
- Trough: Big data, small databases.☆41Updated 9 months ago
- A LevelDB backed URL unshortening microservice written in JavaScript☆31Updated 2 years ago
- Extract networks of entities from journalistic reporting☆48Updated last year
- API endpoint and UI for blockbuilder search page☆20Updated 2 years ago
- Add website scraping abilities to Datasette☆62Updated 2 years ago
- API implementation, User Interface, and more modules of the IPTC EXTRA project☆12Updated 3 years ago
- Datasette plugin for serving media based on a SQL query☆18Updated 2 years ago
- generate rules from lists of words☆16Updated 3 years ago
- My personally curated list of bash/command-line commands and snippets that are very useful yet I keep on forgetting☆18Updated 2 years ago
- a simple interface from extracting texts from (almost) any url☆52Updated 5 years ago
- A living styleguide powering the Mapzen brand (TM)☆13Updated 5 years ago
- The HSV (Hue, Saturation, Value) color space.☆27Updated 4 years ago
- ALPHA ~ A web extension framework for collecting rich, customized browsing history datasets.☆20Updated 3 years ago
- Web interface for Cayley☆25Updated last year
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆46Updated 7 years ago
- 📄 A simple wrapper around the Google Docs API and ArchieML for easily converting the contents of a Google Doc into a ArchieML-produced d…☆23Updated last year
- Pull out versions of specific files from a gitscraping repo into individual files.☆15Updated 3 years ago
- download and process d3.js blocks for further indexing and visualization☆24Updated 5 years ago
- 📑 Read a Google Drive Doc and convert to JSON (via ArchieML)☆22Updated 6 years ago
- A diff tool for SVG files☆39Updated 11 years ago
- A thin GraphQL wrapper around spacy☆21Updated 4 years ago
- Simple service to generate Observablehq notebooks previews outside Observablehq UI. It's also reproducing Observablehq UI styles☆12Updated 3 years ago
- Frontend interface for Datashare, a self-hosted search engine for documents.☆34Updated this week
- convert formatted text to markdown☆12Updated 2 years ago
- Web interface for network analysis.☆21Updated 2 years ago
- Generating text completions based on the Mueller report☆28Updated 6 years ago