html-extract / hextLinks
Domain-specific language for extracting structured data from HTML documents
☆54Updated 4 months ago
Alternatives and similar repositories for hext
Users that are interested in hext are comparing it to the libraries listed below
Sorting:
- Browsertrix: Containerized High-Fidelity Browser-Based Automated Crawling + Behavior System☆87Updated 4 years ago
- My personally curated list of bash/command-line commands and snippets that are very useful yet I keep on forgetting☆18Updated 3 years ago
- a simple graph shell to explore ideas☆115Updated last month
- Browser version of Hyphe (WIP)☆31Updated 4 months ago
- Neo4j powered web application for multimedia collections: bring graph-based exploration and crowd-based indexation.☆24Updated 5 years ago
- Now included in rigour☆151Updated 2 weeks ago
- Add website scraping abilities to Datasette☆64Updated 2 years ago
- The Datasette macOS application☆130Updated last year
- Twitter, quick. Fetch and store tweets on short notice.☆79Updated 8 years ago
- 📑 Read a Google Drive Doc and convert to JSON (via ArchieML)☆22Updated 6 years ago
- A lightweight, standardized library accessing files and datasets, especially tabular ones (CSV, Excel).☆73Updated 2 years ago
- Pre-render Observable notebooks for automation☆61Updated 3 years ago
- A suite of focused and simple tools and activities for journalists, data journalism classrooms and community advocacy groups☆63Updated last year
- Extract networks of entities from journalistic reporting☆48Updated 2 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆46Updated 7 years ago
- A proxy to connect Observable notebooks to databases on private networks☆55Updated last year
- Binary Python bindings for poppler utils for content extraction☆42Updated 4 years ago
- framework for scraping legislative/government data☆88Updated last year
- generate rules from lists of words☆16Updated 4 years ago
- Turn spreadsheet data into a structured, dynamic API.☆110Updated 3 months ago
- a work-in-progress guide to web scraping as an artistic and critical practice☆84Updated 2 years ago
- Interactive open source visualization platform for multivariate dynamic networks.☆94Updated 3 years ago
- Information extraction and interactive visualization of textual datasets for investigative data-driven journalism and eDiscovery☆57Updated last year
- Datasette plugin for visualizing data using Vega☆61Updated 2 years ago
- a client side transcriptions text editor to proofread and correct the text before re-alignement back on the server.☆19Updated 7 years ago
- d3 plugin to create a temporal network visualization☆18Updated 2 years ago
- A tool that democratizes and standardizes access to Web APIs.☆13Updated 2 years ago
- GUI text-based speech and music editor for creating radio/audio stories☆78Updated 2 years ago
- API implementation, User Interface, and more modules of the IPTC EXTRA project☆13Updated 3 years ago
- Datasette plugin for rendering HTML based on JSON values☆28Updated 3 years ago