gojiplus / image-to-textLinks
Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs
☆15Updated 5 years ago
Alternatives and similar repositories for image-to-text
Users that are interested in image-to-text are comparing it to the libraries listed below
Sorting:
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 8 years ago
- (Python) Execute tesseract OCR on a multi-page PDF.☆18Updated last year
- An online reference for data journalism☆25Updated 11 years ago
- Monitor datasets, gets alerts when something happens☆210Updated 6 years ago
- ☆18Updated 9 years ago
- Google Refine extension for adding columns (extending data) from DBpedia☆39Updated 11 years ago
- c-span opened captions node buffer server + google docs apps script☆9Updated 5 years ago
- Tools for tracking stories on news homepages☆48Updated 5 years ago
- Uses NLP methods to parse and classify contracts from The City of New Orleans☆10Updated 10 years ago
- Scraper built with Scrapy.☆17Updated 9 months ago
- Adds read support for Excel files (xls and xlsx) to agate.☆17Updated 3 months ago
- Python scraper to get weekly CDC flu surveillance data☆25Updated 10 years ago
- A glossary for the United States.☆42Updated 10 years ago
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆24Updated 8 years ago
- Examples of bad data, especially from government.☆23Updated 10 months ago
- Date parsing and normalization utilities for Python.☆22Updated last year
- Capstone GRS Website☆7Updated 5 years ago
- Performs user classification into labels using a set of seed Twitter users with known labels and the structure of the interaction network…☆10Updated 8 years ago
- A tool to allow US addresses to be geocoded/georeferenced easily, without using Python or the command line or paid services or anything.☆18Updated 2 years ago
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Updated 11 years ago
- Code for extracting data from a large number of PDFs, particularly FCC political ad documents☆15Updated 7 years ago
- Pure python script that takes user query and summarizes news related to it.☆25Updated 2 years ago
- South Africa's by-laws in XML format☆18Updated 6 years ago
- Global Data Journalists Directory☆10Updated 6 years ago
- Charts for the Consumer Financial Protection Bureau☆12Updated last year
- Ask questions about government data.☆37Updated 6 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆16Updated 9 years ago
- Tools for working with Optical Character Recognition output☆16Updated 11 years ago
- A parser for the Virginia State Corporation Commission's business registration records.☆20Updated 10 months ago
- This is a side project from 2008. This package contains a tool for automatically cropping and deskewing images of book pages captured by …☆28Updated 12 years ago