html-extract / hextLinks
Domain-specific language for extracting structured data from HTML documents
☆54Updated 2 months ago
Alternatives and similar repositories for hext
Users that are interested in hext are comparing it to the libraries listed below
Sorting:
- Browsertrix: Containerized High-Fidelity Browser-Based Automated Crawling + Behavior System☆87Updated 4 years ago
- My personally curated list of bash/command-line commands and snippets that are very useful yet I keep on forgetting☆19Updated 3 years ago
- 📑 Read a Google Drive Doc and convert to JSON (via ArchieML)☆22Updated 7 years ago
- Add website scraping abilities to Datasette☆66Updated 2 years ago
- Twitter, quick. Fetch and store tweets on short notice.☆79Updated 9 years ago
- a simple graph shell to explore ideas☆117Updated 4 months ago
- a work-in-progress guide to web scraping as an artistic and critical practice☆84Updated 2 years ago
- Binary Python bindings for poppler utils for content extraction☆42Updated 4 years ago
- Dead simple cron service for making HTTP calls on a regular schedule.☆14Updated 5 years ago
- Add editing UI and other power-user features to Datasette.☆13Updated 2 years ago
- ☆86Updated 3 years ago
- Datasette plugin for visualizing data using Vega☆61Updated last month
- Pull out versions of specific files from a gitscraping repo into individual files.☆14Updated 4 years ago
- Pre-render Observable notebooks for automation☆62Updated 3 years ago
- experiments in sorting☆27Updated 3 years ago
- A suite of focused and simple tools and activities for journalists, data journalism classrooms and community advocacy groups☆63Updated 2 weeks ago
- Extract networks of entities from journalistic reporting☆49Updated 2 years ago
- Turn spreadsheet data into a structured, dynamic API.☆112Updated 6 months ago
- Now included in rigour☆152Updated last month
- NWJS os x desktop based application that given a video/audio file returns a transcription using IBM Watson Speech to text API☆41Updated 8 years ago
- 📜 A tiny custom element for all your scrollytelling needs!☆27Updated 3 years ago
- An open-source archive that gathers, saves, shares and analyzes news homepages☆149Updated last week
- Interactive open source visualization platform for multivariate dynamic networks.☆94Updated 3 years ago
- The Cartography of DH2020 is based on an innovative visual method to explore conference speakers. In a moment in which conferences went o…☆11Updated last year
- Computer assisted video/audio transcription☆97Updated 5 years ago
- generate rules from lists of words☆16Updated 4 years ago
- Export Airtable data to YAML, JSON or SQLite files on disk☆130Updated last year
- Snowclone a Minute! You too can write an annoying twitter bot of your choosing.☆11Updated 8 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆47Updated 8 years ago
- A lightweight JavaScript client library for the Wikimedia Pageviews API for Wikipedia and various of its sister projects for Node.js and …☆27Updated 4 years ago