internetarchive / analyze_ocr
Parse OCR result files for pagenos, tables of contents, etc.
☆14Updated 13 years ago
Alternatives and similar repositories for analyze_ocr:
Users that are interested in analyze_ocr are comparing it to the libraries listed below
- A Rails engine supporting the discovery of web archives.☆50Updated last year
- This software (prototype) extracts values of Excel spreadsheet properties and calculates a tentative spreadsheet complexity assessment ba…☆12Updated 2 years ago
- Automatic alignment of books between HathiTrust, Internet Archive, Google Books, etc.☆36Updated 10 months ago
- WASAPI data transfer APIs☆43Updated 2 years ago
- Crawl Archivematica's Archival Information Packages (AIP) and provide repository-wide reporting.☆11Updated last week
- utility to fetch provenance information from Internet Archive's Wayback Machine☆13Updated 2 years ago
- A simple OpenRefine reconciliation service that runs on top of a CSV file☆120Updated 9 years ago
- No longer maintained. Please use conciliator instead.☆26Updated 4 years ago
- Prototype SOLR-powered web archive exploration UI.☆43Updated 4 years ago
- Open ONI (Open Online Newspaper Initiative) Django web app☆48Updated 6 months ago
- This project has been archived and is no longer being developed or supported. The Curator's Workbench is an extensible digital collectio…☆24Updated 4 years ago
- DEPRECATED. Replaced with Electron desktop application: https://github.com/bulk-reviewer/bulk-reviewer☆13Updated 5 years ago
- ☆61Updated last year
- All that entity matching, resolution, normalization, enhancement and reconciliation madness, but with a focus on data, not platforms.☆24Updated 2 years ago
- A Memento Aggregator CLI and Server in Go☆61Updated 8 months ago
- A persistent repository for PRONOM Research Week activities☆11Updated 3 years ago
- Tools for tracking stories on news homepages☆48Updated 5 years ago
- ☆14Updated 7 years ago
- A Data Parsing/Data Manipulation Tool Supporting Digitization Projects and Other Data Analysis Projects☆47Updated 5 years ago
- Selected code and data for The Online Books Page and related applications☆11Updated 3 weeks ago
- GeoNames Reconciliation Service for OpenRefine/LODRefine/Google Refine☆48Updated 2 years ago
- Command-line tile downloader/assembler for IIIF endpoints/manifests☆33Updated 3 years ago
- Shared XSLT Files☆30Updated 3 years ago
- Rails application for the Archives Unleashed Cloud.☆11Updated 3 years ago
- Django app for managing PREMIS Events☆14Updated 2 weeks ago
- Format Identification for Digital Objects (FIDO) is a Python command-line tool to identify the file formats of digital objects. It is des…☆155Updated 2 months ago
- Erweiterung von Zotero für die Katalogisierung☆45Updated 11 months ago
- Command line interface to Wikidata Query Service☆55Updated 9 months ago
- work to make the ldr premis compliant☆8Updated 7 years ago
- An IIIF Universe for IIIF catalogs☆26Updated 3 months ago