qedsoftware / multipage-ocrLinks
(Python) Execute tesseract OCR on a multi-page PDF.
☆18Updated last year
Alternatives and similar repositories for multipage-ocr
Users that are interested in multipage-ocr are comparing it to the libraries listed below
Sorting:
- Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs☆15Updated 5 years ago
- 🍊 Prototype Orange widgets — only for the brave.☆12Updated 6 months ago
- Stylometric framework in Python☆17Updated 10 years ago
- Python wrapper for xpdf☆19Updated 5 years ago
- (BROKEN, help wanted)☆15Updated 9 years ago
- Convert text from PDF to XML.☆45Updated 6 years ago
- Keyword Extraction system using Brown Clustering - (This version is trained to extract keywords from job listings)☆18Updated 10 years ago
- Data notification service: subscribe to keywords and get notified whenever an open data sources mentions that keyword.☆24Updated 11 years ago
- Plots various graphs for a series of plaintext files in a directory☆19Updated 9 years ago
- Tools for analyzing the Hillary Clinton emails☆13Updated 9 years ago
- ☆18Updated 6 years ago
- Python bindings for Apache Tika☆22Updated 4 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆39Updated last year
- Static assets for oldnyc.org☆8Updated 2 months ago
- Google Refine extension for adding columns (extending data) from DBpedia☆39Updated 11 years ago
- [archived]☆18Updated 3 years ago
- Binary Python bindings for poppler utils for content extraction☆42Updated 4 years ago
- Trading Consequences data and code☆15Updated 10 years ago
- A digital humanities operating system that runs on a USB disk.☆31Updated 7 years ago
- PST extraction and analytic pipeline☆37Updated 7 years ago
- A tool to allow US addresses to be geocoded/georeferenced easily, without using Python or the command line or paid services or anything.☆18Updated 2 years ago
- Next generation OCR engine based on LSTMs.☆52Updated 7 years ago
- An index data structure for approximate string search.☆23Updated 6 years ago
- Tools for working with Optical Character Recognition output☆16Updated 11 years ago
- OpenRefine is a free, open source power tool for working with messy data and improving it. This repository contains Dockerbuild files fo…☆21Updated 3 years ago
- Monitor datasets, gets alerts when something happens☆210Updated 6 years ago
- Uses NLP methods to parse and classify contracts from The City of New Orleans☆10Updated 10 years ago
- An ultra-simple example of how to use Python to write stories based on a set of data.☆29Updated 11 years ago
- Information extraction and interactive visualization of textual datasets for investigative data-driven journalism and eDiscovery☆57Updated 10 months ago
- Scripts that clean up OCR and munge Hathi metadata.☆76Updated 7 years ago