html-extract / hextLinks
Domain-specific language for extracting structured data from HTML documents
☆54Updated 3 weeks ago
Alternatives and similar repositories for hext
Users that are interested in hext are comparing it to the libraries listed below
Sorting:
- Browsertrix: Containerized High-Fidelity Browser-Based Automated Crawling + Behavior System☆87Updated 4 years ago
- Add website scraping abilities to Datasette☆65Updated 2 years ago
- Datasette plugin for visualizing data using Vega☆61Updated last week
- My personally curated list of bash/command-line commands and snippets that are very useful yet I keep on forgetting☆19Updated 3 years ago
- The Datasette macOS application☆132Updated last year
- Pre-render Observable notebooks for automation☆61Updated 3 years ago
- A suite of focused and simple tools and activities for journalists, data journalism classrooms and community advocacy groups☆63Updated last month
- Twitter, quick. Fetch and store tweets on short notice.☆79Updated 8 years ago
- Add editing UI and other power-user features to Datasette.☆13Updated 2 years ago
- Now included in rigour☆152Updated 2 months ago
- NWJS os x desktop based application that given a video/audio file returns a transcription using IBM Watson Speech to text API☆41Updated 8 years ago
- An alpha project combining beneficial ownership and contracting data☆13Updated 4 years ago
- experiments in sorting☆27Updated 2 years ago
- 📑 Read a Google Drive Doc and convert to JSON (via ArchieML)☆22Updated 7 years ago
- Browser version of Hyphe (WIP)☆31Updated 5 months ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆46Updated 7 years ago
- API implementation, User Interface, and more modules of the IPTC EXTRA project☆13Updated 3 years ago
- Monitor datasets, gets alerts when something happens☆210Updated 6 years ago
- Extract networks of entities from journalistic reporting☆48Updated 2 years ago
- Binary Python bindings for poppler utils for content extraction☆42Updated 4 years ago
- A data pipeline helper written in node to convert a folder of JS/ArchieML/JSON/YAML/CSV/TSV files into usable data.☆47Updated 2 years ago
- A network clustering library for javascript☆35Updated 3 months ago
- A library for accessing a spreadsheet as a native Python object suitable for templating.☆226Updated 7 years ago
- A tool that democratizes and standardizes access to Web APIs.☆14Updated 2 years ago
- framework for scraping legislative/government data☆88Updated last year
- Computer assisted video/audio transcription☆97Updated 5 years ago
- API endpoint and UI for blockbuilder search page☆20Updated 2 years ago
- Date parsing and normalization utilities for Python.☆22Updated 2 years ago
- A Node.js wrapper around the DocumentCloud API.☆12Updated 8 years ago
- JavaScript app for displaying annotated network graphs based on data from LittleSis☆102Updated 3 months ago