html-extract / hextLinks
Domain-specific language for extracting structured data from HTML documents
☆54Updated 5 months ago
Alternatives and similar repositories for hext
Users that are interested in hext are comparing it to the libraries listed below
Sorting:
- Browsertrix: Containerized High-Fidelity Browser-Based Automated Crawling + Behavior System☆87Updated 4 years ago
- The Datasette macOS application☆131Updated last year
- A suite of focused and simple tools and activities for journalists, data journalism classrooms and community advocacy groups☆63Updated 2 weeks ago
- My personally curated list of bash/command-line commands and snippets that are very useful yet I keep on forgetting☆19Updated 3 years ago
- Add website scraping abilities to Datasette☆63Updated 2 years ago
- a simple graph shell to explore ideas☆116Updated 2 months ago
- Now included in rigour☆152Updated last month
- Binary Python bindings for poppler utils for content extraction☆42Updated 4 years ago
- 📑 Read a Google Drive Doc and convert to JSON (via ArchieML)☆22Updated 7 years ago
- Datasette plugin for visualizing data using Vega☆61Updated 2 years ago
- Twitter, quick. Fetch and store tweets on short notice.☆79Updated 8 years ago
- Browser version of Hyphe (WIP)☆31Updated 5 months ago
- JavaScript app for displaying annotated network graphs based on data from LittleSis☆102Updated 2 months ago
- experiments in sorting☆27Updated 2 years ago
- generate rules from lists of words☆16Updated 4 years ago
- An alpha project combining beneficial ownership and contracting data☆13Updated 4 years ago
- Computer assisted video/audio transcription☆97Updated 5 years ago
- Pre-render Observable notebooks for automation☆61Updated 3 years ago
- Pull out versions of specific files from a gitscraping repo into individual files.☆15Updated 4 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆46Updated 7 years ago
- Neo4j powered web application for multimedia collections: bring graph-based exploration and crowd-based indexation.☆24Updated 5 years ago
- NWJS os x desktop based application that given a video/audio file returns a transcription using IBM Watson Speech to text API☆41Updated 8 years ago
- framework for scraping legislative/government data☆88Updated last year
- Schemas to convert common fixed-width file formats into CSV using in2csv.☆125Updated 4 years ago
- a client side transcriptions text editor to proofread and correct the text before re-alignement back on the server.☆19Updated 7 years ago
- API implementation, User Interface, and more modules of the IPTC EXTRA project☆13Updated 3 years ago
- Turn spreadsheet data into a structured, dynamic API.☆112Updated 4 months ago
- Extract networks of entities from journalistic reporting☆48Updated 2 years ago
- A data pipeline helper written in node to convert a folder of JS/ArchieML/JSON/YAML/CSV/TSV files into usable data.☆47Updated 2 years ago
- Web interface for Cayley☆25Updated 2 years ago