html-extract / hextLinks
Domain-specific language for extracting structured data from HTML documents
☆53Updated 2 months ago
Alternatives and similar repositories for hext
Users that are interested in hext are comparing it to the libraries listed below
Sorting:
- Rig for deploying DocumentCloud viewers to S3.☆13Updated 3 years ago
- Explore networks and publish narratives.☆53Updated 4 years ago
- a simple graph shell to explore ideas☆115Updated 6 months ago
- Browsertrix: Containerized High-Fidelity Browser-Based Automated Crawling + Behavior System☆87Updated 4 years ago
- Machine learning model to recommend related content☆19Updated last year
- Add website scraping abilities to Datasette☆64Updated 2 years ago
- Pre-render Observable notebooks for automation☆61Updated 3 years ago
- A helper library full of URL-related heuristics.☆70Updated last month
- Datasette plugin for visualizing data using Vega☆59Updated last year
- Information extraction and interactive visualization of textual datasets for investigative data-driven journalism and eDiscovery☆56Updated last year
- Binary Python bindings for poppler utils for content extraction☆42Updated 4 years ago
- Twitter, quick. Fetch and store tweets on short notice.☆80Updated 8 years ago
- A suite of focused and simple tools and activities for journalists, data journalism classrooms and community advocacy groups☆63Updated last year
- Pull out versions of specific files from a gitscraping repo into individual files.☆15Updated 4 years ago
- Tunable full text search engine in JavaScript that: (1) works natively on web apps like Express.js; (2) easy to customize (via BM25) to s…☆35Updated 6 years ago
- API endpoint and UI for blockbuilder search page☆20Updated 2 years ago
- DocumentCloud's back end source code - Please report bugs, issues and feature requests to info@documentcloud.org☆40Updated this week
- Browser version of Hyphe (WIP)☆31Updated 2 months ago
- experiments in sorting☆27Updated 2 years ago
- Measure is scripts and conventions to build KPI dashboards for projects.☆17Updated 5 years ago
- Explore how networks change over time☆48Updated 4 years ago
- Turn spreadsheet data into a structured, dynamic API.☆106Updated last month
- A library for accessing a spreadsheet as a native Python object suitable for templating.☆225Updated 7 years ago
- DBpedia, which frequently crawls and analyses over 120 Wikipedia language editions has near complete information about (1) which facts ar…☆11Updated 2 years ago
- A lightweight, standardized library accessing files and datasets, especially tabular ones (CSV, Excel).☆73Updated 2 years ago
- Inspect a URL and estimate if it contains a news story☆39Updated 7 months ago
- 📑 Read a Google Drive Doc and convert to JSON (via ArchieML)☆22Updated 6 years ago
- My personally curated list of bash/command-line commands and snippets that are very useful yet I keep on forgetting☆18Updated 3 years ago
- MediaScape project researching the utility of Generous Interfaces for audiovisual archives☆10Updated 5 months ago
- GUI text-based speech and music editor for creating radio/audio stories☆77Updated 2 years ago