qedsoftware / multipage-ocr
(Python) Execute tesseract OCR on a multi-page PDF.
☆18Updated last year
Alternatives and similar repositories for multipage-ocr:
Users that are interested in multipage-ocr are comparing it to the libraries listed below
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated 11 months ago
- A trend viewer written in Python/JavaScript☆21Updated 3 months ago
- Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs☆15Updated 5 years ago
- Binary Python bindings for poppler utils for content extraction☆42Updated 3 years ago
- Tools for analyzing the Hillary Clinton emails☆13Updated 8 years ago
- A toolkit for mapping networks of political and economic influence through diverse types of entities and their relations. Accessible at h…☆186Updated 3 years ago
- Next generation OCR engine based on LSTMs.☆52Updated 6 years ago
- Python wrapper for xpdf☆19Updated 5 years ago
- Uses NLP methods to parse and classify contracts from The City of New Orleans☆10Updated 9 years ago
- ☆27Updated 2 weeks ago
- Google Refine extension for adding columns (extending data) from DBpedia☆39Updated 11 years ago
- Execute OpenRefine JSON scripts without OpenRefine (or Java)☆29Updated 2 years ago
- Data notification service: subscribe to keywords and get notified whenever an open data sources mentions that keyword.☆24Updated 11 years ago
- Pipeline for distributed Natural Language Processing, made in Python☆65Updated 8 years ago
- 🍊 🎓 Educational widgets for machine learning and data mining in Orange 3.☆27Updated 11 months ago
- Date parsing and normalization utilities for Python.☆22Updated last year
- python-timbl, originally developed by Sander Canisius, is a Python extension module wrapping the full TiMBL C++ programming interface. Wi…☆18Updated 3 weeks ago
- An expandable and scalable OCR pipeline☆87Updated 7 years ago
- 🚀GUI for training spaCy models☆54Updated 3 years ago
- This is a REST Server endpoint built using Flask and Python.☆24Updated 2 years ago
- Sanskrit Corpus☆16Updated 8 years ago
- Information extraction and interactive visualization of textual datasets for investigative data-driven journalism and eDiscovery☆54Updated 7 months ago
- Dump of generated texts from GPT-2 trained on /r/legaladvice subreddit titles☆23Updated 5 years ago
- Stylometric framework in Python☆13Updated 9 years ago
- NYT Risk Semantics Project☆12Updated 8 years ago
- Palladio Application☆40Updated 3 years ago
- Keyword Extraction system using Brown Clustering - (This version is trained to extract keywords from job listings)☆17Updated 10 years ago
- Tribe extracts a network from an email mbox and writes it to a graphml file for visualization and analysis.☆79Updated last year
- Tools for working with Optical Character Recognition output☆16Updated 10 years ago
- Named-Entity Recognition extension for Google Refine / OpenRefine☆72Updated 7 years ago