qedsoftware / multipage-ocrLinks
(Python) Execute tesseract OCR on a multi-page PDF.
☆19Updated 2 years ago
Alternatives and similar repositories for multipage-ocr
Users that are interested in multipage-ocr are comparing it to the libraries listed below
Sorting:
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year
- Easily display Zotero items on a webpage☆32Updated 2 years ago
- Orange Data Mining Homepage☆17Updated 6 years ago
- Use visual programming to build data tables based on text data within the Orange data mining software environment☆29Updated 2 weeks ago
- Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations a…☆99Updated 3 years ago
- Ergonomic line-by-line transcription of scanned text.☆54Updated 4 years ago
- Convert text from PDF to XML.☆45Updated 7 years ago
- Tribe extracts a network from an email mbox and writes it to a graphml file for visualization and analysis.☆79Updated 2 years ago
- Python wrapper for xpdf☆19Updated 5 years ago
- A simple viewer and inspection tool for text boxes in PDF documents☆95Updated 3 years ago
- Binary Python bindings for poppler utils for content extraction☆42Updated 4 years ago
- Installer for Thymeflow, a personal knowledge management system.☆34Updated 7 years ago
- 🍊 Data fusion add-on for Orange3☆16Updated 5 years ago
- Soundex Phonetic Code Algorithm Demo for Indian Languages. Supports all indian languages and English. Provides intra-indic string compari…☆58Updated 6 years ago
- LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilatio…☆68Updated 2 years ago
- (BROKEN, help wanted)☆15Updated 9 years ago
- An expandable and scalable OCR pipeline☆87Updated 7 years ago
- Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs☆15Updated 6 years ago
- A place to collect and share knowledge about liberating data from PDFs☆55Updated 3 years ago
- A Python framework for deploying recommendation models for form fields.☆10Updated 3 years ago
- Tools for analyzing the Hillary Clinton emails☆13Updated 9 years ago
- Date parsing and normalization utilities for Python.☆22Updated 2 years ago
- Palladio Application☆42Updated 4 years ago
- A base library for building web scrapers for statistical data, and a helper ontology for (primarily Swedish) statistical data.☆14Updated 8 months ago
- Next generation OCR engine based on LSTMs.☆52Updated 7 years ago
- ☆17Updated 3 months ago
- Keyword Extraction system using Brown Clustering - (This version is trained to extract keywords from job listings)☆18Updated 11 years ago
- Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.☆45Updated last week
- Monitor datasets, gets alerts when something happens☆210Updated 6 years ago
- Information extraction and interactive visualization of textual datasets for investigative data-driven journalism and eDiscovery☆57Updated last year