html-extract / hextLinks
Domain-specific language for extracting structured data from HTML documents
☆53Updated 3 weeks ago
Alternatives and similar repositories for hext
Users that are interested in hext are comparing it to the libraries listed below
Sorting:
- experiments in sorting☆26Updated 2 years ago
- Rig for deploying DocumentCloud viewers to S3.☆13Updated 3 years ago
- API endpoint and UI for blockbuilder search page☆20Updated 2 years ago
- Browser version of Hyphe (WIP)☆30Updated 3 weeks ago
- A lightweight JavaScript client library for the Wikimedia Pageviews API for Wikipedia and various of its sister projects for Node.js and …☆27Updated 4 years ago
- Browsertrix: Containerized High-Fidelity Browser-Based Automated Crawling + Behavior System☆87Updated 4 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆46Updated 7 years ago
- My personally curated list of bash/command-line commands and snippets that are very useful yet I keep on forgetting☆18Updated 2 years ago
- ALPHA ~ A web extension framework for collecting rich, customized browsing history datasets.☆20Updated 3 years ago
- generate rules from lists of words☆16Updated 3 years ago
- makes supercuts from youtube searches (alpha)☆12Updated 7 years ago
- A raspberry pi 64bit image with spacy and neuralcoref pre-installed☆21Updated 5 years ago
- Generating text completions based on the Mueller report☆28Updated 6 years ago
- Examples of bad data, especially from government.☆23Updated 10 months ago
- a simple interface from extracting texts from (almost) any url☆52Updated 5 years ago
- Add website scraping abilities to Datasette☆62Updated 2 years ago
- Datasette plugin for inserting and updating data☆20Updated last year
- Add editing UI and other power-user features to Datasette.☆12Updated 2 years ago
- an image annotation and publication tool☆27Updated 4 years ago
- 📑 Read a Google Drive Doc and convert to JSON (via ArchieML)☆22Updated 6 years ago
- 🖼 A minimalistic take on responsive iframes in the spirit of Pym.js.☆26Updated 2 years ago
- A network clustering library for javascript☆34Updated last month
- Datasette plugin for rendering HTML based on JSON values☆26Updated 3 years ago
- Pull out versions of specific files from a gitscraping repo into individual files.☆15Updated 3 years ago
- Machine assisted dossiers☆19Updated 7 years ago
- Extract networks of entities from journalistic reporting☆48Updated last year
- Like Tabletop.js — but for Google Docs!☆66Updated 8 years ago
- A suite of focused and simple tools and activities for journalists, data journalism classrooms and community advocacy groups☆63Updated last year
- d3 plugin to create a temporal network visualization☆18Updated 2 years ago
- download and process d3.js blocks for further indexing and visualization☆24Updated 6 years ago