internetarchive / analyze_ocr
Parse OCR result files for pagenos, tables of contents, etc.
☆14Updated 12 years ago
Related projects ⓘ
Alternatives and complementary repositories for analyze_ocr
- "Old SFM" -- manage rules and streams from social data sources, starting with twitter.☆87Updated last year
- All that entity matching, resolution, normalization, enhancement and reconciliation madness, but with a focus on data, not platforms.☆24Updated 2 years ago
- Automatic alignment of books between HathiTrust, Internet Archive, Google Books, etc.☆36Updated 8 months ago
- A tool for the geospatial analysis, literary network visualization, and plot mapping of ancient texts☆14Updated 6 years ago
- A Rails engine supporting the discovery of web archives.☆49Updated last year
- A simple OpenRefine reconciliation service that runs on top of a CSV file☆118Updated 9 years ago
- Erweiterung von Zotero für die Katalogisierung☆45Updated 9 months ago
- Embedr.eu - Image Embedding Service (IES) with support for IIIF, OEmbed, zoomable viewer in an iFrame☆15Updated 8 years ago
- Command-line tile downloader/assembler for IIIF endpoints/manifests☆31Updated 3 years ago
- This software (prototype) extracts values of Excel spreadsheet properties and calculates a tentative spreadsheet complexity assessment ba…☆12Updated last year
- Complement to https://github.com/derek73/python-nameparser for parsing lists of names.☆16Updated 9 years ago
- Open ONI (Open Online Newspaper Initiative) Django web app☆48Updated 4 months ago
- Scripts to create git repositories for ALTO XML texts, like those from the British Library's scanned documents.☆31Updated 7 years ago
- SCAlable Preservation Environments☆39Updated 2 years ago
- A digital humanities operating system that runs on a USB disk.☆31Updated 7 years ago
- DEPRECATED. Replaced with Electron desktop application: https://github.com/bulk-reviewer/bulk-reviewer☆13Updated 5 years ago
- Chrome extension that uses Memento to indicate that a page a user is viewing on the live web has an archived copy and to give the user ac…☆49Updated last month
- Sort-friendly URI Reordering Transform (SURT) python module☆40Updated 3 months ago
- A python client for the DPLA API☆43Updated 2 years ago
- utility to fetch provenance information from Internet Archive's Wayback Machine☆13Updated 2 years ago
- Prototype SOLR-powered web archive exploration UI.☆43Updated 4 years ago
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.☆102Updated last week
- Social Feed Manager user interface application.☆153Updated 4 months ago
- Trough: Big data, small databases.☆40Updated 3 months ago
- Open-source tools for working with BIBFRAME (see: http://bibframe.org), by default BIBFRAME Lite (see: http://bibfra.me) and more general…☆23Updated 3 years ago
- No longer maintained. Please use conciliator instead.☆26Updated 4 years ago
- Tools for helping you work with web platform archive downloads.☆17Updated 4 years ago
- Docker image for the Archives Unleashed Toolkit☆12Updated 2 years ago