internetarchive / analyze_ocr
Parse OCR result files for pagenos, tables of contents, etc.
☆14Updated 12 years ago
Related projects: ⓘ
- utility to fetch provenance information from Internet Archive's Wayback Machine☆13Updated 2 years ago
- A Rails engine supporting the discovery of web archives.☆48Updated last year
- A tool for the geospatial analysis, literary network visualization, and plot mapping of ancient texts☆14Updated 6 years ago
- "Old SFM" -- manage rules and streams from social data sources, starting with twitter.☆87Updated last year
- Prototype SOLR-powered web archive exploration UI.☆42Updated 4 years ago
- WASAPI data transfer APIs☆42Updated 2 years ago
- A Data Parsing/Data Manipulation Tool Supporting Digitization Projects and Other Data Analysis Projects☆47Updated 4 years ago
- Scripts to create git repositories for ALTO XML texts, like those from the British Library's scanned documents.☆31Updated 6 years ago
- A simple OpenRefine reconciliation service that runs on top of a CSV file☆117Updated 9 years ago
- Open ONI (Open Online Newspaper Initiative) Django web app☆47Updated 2 months ago
- No longer maintained. Please use conciliator instead.☆27Updated 3 years ago
- Open-source tools for working with BIBFRAME (see: http://bibframe.org), by default BIBFRAME Lite (see: http://bibfra.me) and more general…☆23Updated 3 years ago
- Automatic alignment of books between HathiTrust, Internet Archive, Google Books, etc.☆35Updated 6 months ago
- simple script to convert web resources to a single warc file☆18Updated last year
- Embedr.eu - Image Embedding Service (IES) with support for IIIF, OEmbed, zoomable viewer in an iFrame☆15Updated 8 years ago
- Tools for tracking stories on news homepages☆48Updated 4 years ago
- Docker image for the Archives Unleashed Toolkit☆12Updated last year
- ☆19Updated this week
- Trading Consequences data and code☆15Updated 9 years ago
- Test cases for validating BagIt implementations☆10Updated last year
- All that entity matching, resolution, normalization, enhancement and reconciliation madness, but with a focus on data, not platforms.☆23Updated 2 years ago
- Tools for helping you work with web platform archive downloads.☆17Updated 4 years ago
- A python client for the DPLA API☆43Updated last year
- a CLI suggestion tool for Wikidata entities☆29Updated 7 years ago
- Download digitized books from Internet Archive and view with IIIF, locally and offline.☆34Updated 5 months ago
- ☆27Updated this week
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.☆100Updated last month
- Scripts for scraping metadata from Academia.edu and migrating publications into Zenodo.org via its REST API☆11Updated 7 years ago
- A Memento Aggregator CLI and Server in Go☆55Updated 4 months ago