qedsoftware / multipage-ocrLinks
(Python) Execute tesseract OCR on a multi-page PDF.
☆19Updated 2 years ago
Alternatives and similar repositories for multipage-ocr
Users that are interested in multipage-ocr are comparing it to the libraries listed below
Sorting:
- Use visual programming to build data tables based on text data within the Orange data mining software environment☆29Updated this week
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year
- 🍊 Prototype Orange widgets — only for the brave.☆12Updated last month
- Binary Python bindings for poppler utils for content extraction☆42Updated 4 years ago
- Python wrapper for xpdf☆19Updated 5 years ago
- Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs☆15Updated 6 years ago
- Monitor datasets, gets alerts when something happens☆210Updated 6 years ago
- (BROKEN, help wanted)☆15Updated 9 years ago
- LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilatio…☆68Updated 2 years ago
- PST extraction and analytic pipeline☆37Updated 7 years ago
- Tribe extracts a network from an email mbox and writes it to a graphml file for visualization and analysis.☆79Updated 2 years ago
- A simple viewer and inspection tool for text boxes in PDF documents☆95Updated 3 years ago
- Convert text from PDF to XML.☆45Updated 7 years ago
- Tools for analyzing the Hillary Clinton emails☆13Updated 9 years ago
- Next generation OCR engine based on LSTMs.☆52Updated 7 years ago
- Detect and visualize text reuse☆118Updated last year
- Friendly Slack bot for looking up cases☆21Updated 7 years ago
- Python library and command line tool for converting data from one format to another☆99Updated 5 years ago
- A library for extracting tables from PDF files☆92Updated 5 years ago
- Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations a…☆98Updated 3 years ago
- Ergonomic line-by-line transcription of scanned text.☆53Updated 4 years ago
- A place to collect and share knowledge about liberating data from PDFs☆55Updated 3 years ago
- Python bindings for Apache Tika☆24Updated 5 years ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆47Updated 3 years ago
- Frontend component for Hoaxy, a tool to visualize the spread of claims and fact checking☆72Updated 3 years ago
- ☆11Updated 6 years ago
- An expandable and scalable OCR pipeline☆87Updated 7 years ago
- A toolkit for mapping networks of political and economic influence through diverse types of entities and their relations. Accessible at h…☆189Updated 4 years ago
- framework for scraping legislative/government data☆88Updated last year
- PDF analysis. Convert contents of PDF to a JSON-style python dictionary.☆31Updated 3 years ago