qedsoftware / multipage-ocr
(Python) Execute tesseract OCR on a multi-page PDF.
☆18Updated last year
Alternatives and similar repositories for multipage-ocr
Users that are interested in multipage-ocr are comparing it to the libraries listed below
Sorting:
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year
- Tools for analyzing the Hillary Clinton emails☆13Updated 9 years ago
- Demo of the Newspaper article extraction library.☆29Updated 10 years ago
- Python wrapper for xpdf☆19Updated 5 years ago
- Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs☆15Updated 5 years ago
- Next generation OCR engine based on LSTMs.☆52Updated 7 years ago
- Automated NLP sentiment predictions- batteries included, or use your own data☆18Updated 7 years ago
- Tribe extracts a network from an email mbox and writes it to a graphml file for visualization and analysis.☆79Updated 2 years ago
- Detecting Mines in the Democratic Republic of Congo via Satellite Imagery☆12Updated 2 years ago
- Python and pandas tools to perform various analyses on different types of word lists☆16Updated 10 years ago
- A python client for connecting to all the services provided by https://dandelion.eu☆36Updated last year
- Elasticsearch like search engine supporting real time indexing and querying☆15Updated 8 years ago
- This project scrapes text from Telugu books(Novels)☆10Updated 3 years ago
- Data notification service: subscribe to keywords and get notified whenever an open data sources mentions that keyword.☆24Updated 11 years ago
- Force Based Network Visualization Library With Automated Scaling To Prevent Node Overlap, Label Adjustment & Easy Community Visualization…☆15Updated 6 years ago
- A toolkit for clustering web pages based on various similarity measures.☆33Updated 3 years ago
- (BROKEN, help wanted)☆15Updated 9 years ago
- Trading Consequences data and code☆15Updated 10 years ago
- Tools for working with Optical Character Recognition output☆16Updated 11 years ago
- Getting, analysing and displaying lists of papers☆15Updated 7 months ago
- Convert text from PDF to XML.☆45Updated 6 years ago
- A base library for building web scrapers for statistical data, and a helper ontology for (primarily Swedish) statistical data.☆13Updated 2 months ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆56Updated last year
- Pipeline for distributed Natural Language Processing, made in Python☆64Updated 8 years ago
- Text Thresher crowd sourced text annotator☆16Updated 7 years ago
- Binary Python bindings for poppler utils for content extraction☆42Updated 4 years ago
- Keyword Extraction system using Brown Clustering - (This version is trained to extract keywords from job listings)☆18Updated 10 years ago
- Word Religion Projections (2010-2050)☆16Updated 6 months ago
- Code for recon16 hack day☆16Updated 7 years ago
- RESTful API around the PETRARCH coding software☆10Updated 4 years ago