qedsoftware / multipage-ocr
(Python) Execute tesseract OCR on a multi-page PDF.
☆18Updated last year
Alternatives and similar repositories for multipage-ocr:
Users that are interested in multipage-ocr are comparing it to the libraries listed below
- Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs☆15Updated 5 years ago
- Ergonomic line-by-line transcription of scanned text.☆51Updated 4 years ago
- Convert text from PDF to XML.☆45Updated 6 years ago
- Getting, analysing and displaying lists of papers☆15Updated 6 months ago
- 🍊 Prototype Orange widgets — only for the brave.☆12Updated 4 months ago
- Relatively simple text classification powered by spaCy☆41Updated 9 years ago
- A python client for connecting to all the services provided by https://dandelion.eu☆36Updated last year
- Python tools for Tesseract OCR training☆25Updated 2 years ago
- An expandable and scalable OCR pipeline☆87Updated 7 years ago
- A small Docker built for the OCRopus OCR system.☆20Updated 7 years ago
- Tribe extracts a network from an email mbox and writes it to a graphml file for visualization and analysis.☆79Updated 2 years ago
- Next generation OCR engine based on LSTMs.☆52Updated 7 years ago
- NLTK Website☆62Updated 8 months ago
- A digital humanities operating system that runs on a USB disk.☆31Updated 7 years ago
- Graphical Image Anotation Tool - Digital Humanities - electron client server application with neo4j database for a mxgraph frontend with …☆17Updated 4 years ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 8 years ago
- Data Mining Historical Newspaper Metadata (METS/ALTO formats)☆25Updated 2 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year
- Wrapper for DKPro Core to extract lingustic information from books.☆16Updated 3 years ago
- 'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.☆22Updated 7 years ago
- Given a text, wrap it into phrases and send them to Yandex's search engine. If it yields a "did you mean:", substitute the original phras…☆11Updated 6 years ago
- modification of bibliotools 2.2 from Sébastian Grauwin☆11Updated 5 years ago
- An ultra-simple example of how to use Python to write stories based on a set of data.☆29Updated 11 years ago
- Exploring the shapes of stories using indico sentiment analysis APIs☆28Updated 9 years ago
- This page is a companion for the paper titled Towards Automatic Structuring and Semantic Indexing of Legal Documents☆29Updated 6 years ago
- A scraper focused on organizational Github accounts and their members.☆42Updated 2 years ago
- Google Refine extension for adding columns (extending data) from DBpedia☆39Updated 11 years ago
- Use visual programming to build data tables based on text data within the Orange data mining software environment☆29Updated last month
- Easily display Zotero items on a webpage☆32Updated 2 years ago
- NYT Risk Semantics Project☆12Updated 9 years ago