fritz-hh / OCRmyPDFLinks
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
☆261Updated 9 years ago
Alternatives and similar repositories for OCRmyPDF
Users that are interested in OCRmyPDF are comparing it to the libraries listed below
Sorting:
- A toolbox and web application for working with and presenting textual material from Shakespeare to Schopenhauer, and letters to literatur…☆149Updated 10 years ago
- ☆29Updated 8 years ago
- Politwoops web front end☆44Updated 7 years ago
- “Let Me Get That Data For You” catalogs the machine-readable data on a given domain name. [RETIRED]☆102Updated 10 years ago
- Modular workflow assistant for book digitization☆128Updated 9 years ago
- Docker container to provide Apache Tika RESTful API☆41Updated 9 years ago
- Moved to:☆58Updated 6 years ago
- Enhanced Social Tagging for Academic Communities☆97Updated 10 months ago
- A push-button Digital Humanities laboratory.☆127Updated 7 years ago
- Slaw is a lightweight library for rendering and generating Akoma Ntoso acts from plain text and PDF documents.☆27Updated 3 years ago
- Turns legal citations in the DOM into links☆20Updated 8 years ago
- CFPB's streaming batch geocoder☆36Updated 8 years ago
- Extract tables from PDF files☆358Updated 9 years ago
- Scan a folder of document files of all types and extract the text into a CSV suitable for Overview☆26Updated 9 years ago
- Guides and introductions for participating in Labs and some of its projects.☆170Updated 8 years ago
- A website for crowd-sourcing structured election candidate data☆58Updated 5 years ago
- Re-usable wrapper scripts for text document extractors.☆37Updated 9 years ago
- Friendly Slack bot for looking up cases☆21Updated 7 years ago
- command line resource for working with digital primary sources☆28Updated 7 years ago
- CSV grooming, the JS way☆21Updated 6 years ago
- Scripts to create git repositories for ALTO XML texts, like those from the British Library's scanned documents.☆31Updated 7 years ago
- REPO DEPRECATED; see the current version in Lunchbox http://github.com/nprapps/lunchbox☆93Updated 8 years ago
- (Note: This repository is obsolete, please see the new Browsertrix webrecorder/browsertrix) Browser-Based On-Demand Web Archiving Automat…☆39Updated 6 years ago
- Breve☆29Updated 6 years ago
- mirror a website, put it in a bag☆25Updated 2 years ago
- A simple OpenRefine reconciliation service that runs on top of a CSV file☆121Updated 10 years ago
- A tool for the geospatial analysis, literary network visualization, and plot mapping of ancient texts☆14Updated 6 years ago
- a CLI suggestion tool for Wikidata entities☆30Updated 8 years ago
- International legislative data specifications☆101Updated 2 years ago
- Tables is a simple command-line tool and powerful library for importing data like a CSV or JSON file into relational tables☆88Updated 2 years ago