harvard-lil / bag-nabitLinks
Download and attach provenance to public datasets
☆33Updated 3 months ago
Alternatives and similar repositories for bag-nabit
Users that are interested in bag-nabit are comparing it to the libraries listed below
Sorting:
- Data Catalog Vocabulary (DCAT) - United States Profile Chief Data Officers Council & Federal Committee on Statistical Methodology☆71Updated last month
- Adds a reconciliation API endpoint to Datasette, based on the Reconciliation Service API specification.☆24Updated last year
- A tool for collection archival slivers of the web and web archives☆13Updated 4 months ago
- A static site generator for SPARQL backends.☆130Updated 3 months ago
- poetry from dirty ocr☆60Updated 4 years ago
- Experimental proxy and wrapper for safely embedding Web Archives (warc, warc.gz, wacz) into web pages.☆34Updated 2 months ago
- 🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.☆162Updated 3 weeks ago
- Command line tool for digging into WARC files☆43Updated 2 weeks ago
- (Experimental) High-fidelity capture of Twitter threads as sealed PDFs.☆54Updated last year
- Web application for distributed compute analysis of Archive-It web archive collections.☆19Updated 3 months ago
- Add website scraping abilities to Datasette☆64Updated 2 years ago
- ☆24Updated 2 years ago
- Browser-based app for segmenting & OCRing PDF pages based on whitespace rules. To assist researchers (especially in the humanities) with …☆12Updated last year
- Command-line tool and Rust library for handling Web ARChive (WARC) files☆20Updated last month
- search interface for scholarly works☆85Updated 11 months ago
- WARC + AI - Experimental Retrieval Augmented Generation Pipeline for Web Archive Collections.☆256Updated 5 months ago
- Bad link reporter for GitHub repositories☆12Updated last year
- Python package to reconcile DataFrames☆24Updated 2 years ago
- Tools for running OCR against files stored in S3☆119Updated 2 years ago
- CSV on the web☆42Updated 4 months ago
- Generates large collages of images using OpenSeadragon☆49Updated last year
- ☆25Updated 2 years ago
- Create Robust Links from within Zotero☆20Updated 3 years ago
- Datasette plugin to create interactive dashboards☆147Updated 2 weeks ago
- Build a search index across content from multiple SQLite database tables and run faceted searches against it using Datasette☆195Updated 3 years ago
- Bagit-based data packaging specification for dissemination of research data with useful human and machine readable metadata: "Make Data C…☆39Updated 6 years ago
- Tracking the history of trees in San Francisco☆46Updated last week
- Metadata management and dissemination system for Open Access books☆54Updated last week
- DocumentCloud's back end source code - Please report bugs, issues and feature requests to info@documentcloud.org☆40Updated this week
- A Twitter, Mastodon, and BlueSky bot that shares new interactive, graphic, and data vis stories from newsrooms around the world☆58Updated this week