html-extract / hext
Domain-specific language for extracting structured data from HTML documents
☆52Updated 2 months ago
Alternatives and similar repositories for hext:
Users that are interested in hext are comparing it to the libraries listed below
- DBpedia, which frequently crawls and analyses over 120 Wikipedia language editions has near complete information about (1) which facts ar…☆10Updated 2 years ago
- Rig for deploying DocumentCloud viewers to S3.☆13Updated 3 years ago
- Pull out versions of specific files from a gitscraping repo into individual files.☆15Updated 3 years ago
- Browser version of Hyphe (WIP)☆30Updated 3 months ago
- Measure is scripts and conventions to build KPI dashboards for projects.☆17Updated 4 years ago
- interactive, customizable semantic web visualization☆14Updated 3 months ago
- A network clustering library for javascript☆34Updated last year
- Browsertrix: Containerized High-Fidelity Browser-Based Automated Crawling + Behavior System☆87Updated 3 years ago
- Generative tree visualiser for Python☆14Updated 4 years ago
- Add editing UI and other power-user features to Datasette.☆12Updated last year
- generate rules from lists of words☆16Updated 3 years ago
- Join data in the browser. Supports csv, tsv, psv, *json and dbf.☆11Updated 2 years ago
- Adds a reconciliation API endpoint to Datasette, based on the Reconciliation Service API specification.☆23Updated 11 months ago
- an image annotation and publication tool☆27Updated 4 years ago
- Add website scraping abilities to Datasette☆62Updated last year
- A collection of visualization projects built on Wikipedia data.☆40Updated 2 years ago
- experiments in sorting☆25Updated 2 years ago
- A LevelDB backed URL unshortening microservice written in JavaScript☆31Updated 2 years ago
- ALPHA ~ A web extension framework for collecting rich, customized browsing history datasets.☆19Updated 3 years ago
- A lightweight JavaScript client library for the Wikimedia Pageviews API for Wikipedia and various of its sister projects for Node.js and …☆27Updated 4 years ago
- Machine learning model to recommend related content☆19Updated last year
- a simple interface from extracting texts from (almost) any url☆52Updated 5 years ago
- Google Refine extension for adding columns (extending data) from DBpedia☆39Updated 11 years ago
- API endpoint and UI for blockbuilder search page☆20Updated 2 years ago
- etl pipeline, graphical explorer and general toolbox for investigations with follow the money data☆14Updated last year
- My personally curated list of bash/command-line commands and snippets that are very useful yet I keep on forgetting☆18Updated 2 years ago
- Examples of bad data, especially from government.☆22Updated 5 months ago
- Tools for working with Optical Character Recognition output☆16Updated 10 years ago
- A JSON dataset of information about language museums around the world☆12Updated 4 years ago