soodoku / image-to-text
Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs
☆15Updated 5 years ago
Alternatives and similar repositories for image-to-text:
Users that are interested in image-to-text are comparing it to the libraries listed below
- (Python) Execute tesseract OCR on a multi-page PDF.☆18Updated last year
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆24Updated 8 years ago
- An online reference for data journalism☆25Updated 10 years ago
- A glossary for the United States.☆42Updated 9 years ago
- Monitor datasets, gets alerts when something happens☆210Updated 6 years ago
- Examples of bad data, especially from government.☆23Updated 7 months ago
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Updated 10 years ago
- A scraper focused on organizational Github accounts and their members.☆42Updated 2 years ago
- Tools for working with Optical Character Recognition output☆16Updated 11 years ago
- Date parsing and normalization utilities for Python.☆22Updated last year
- Responsively embed DocumentCloud notes.☆21Updated 6 years ago
- Data notification service: subscribe to keywords and get notified whenever an open data sources mentions that keyword.☆24Updated 11 years ago
- Tools for tracking stories on news homepages☆48Updated 5 years ago
- Ask questions about government data.☆37Updated 6 years ago
- GenderTracker is a service that decomposes articles and computes various gender-related metrics based on the content.☆25Updated 11 years ago
- Python scraper to get weekly CDC flu surveillance data☆25Updated 10 years ago
- A library and command-line tool for fetching Facebook Pages' published posts.☆13Updated 7 years ago
- Scraper built with Scrapy.☆15Updated 7 months ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 8 years ago
- ☆11Updated 9 years ago
- Repository of the GINI index official repository.☆15Updated 3 weeks ago
- Legislative data from the congress repository☆19Updated 11 years ago
- Collections of english historical texts and data relating to them☆18Updated 4 years ago
- Code for extracting data from a large number of PDFs, particularly FCC political ad documents☆15Updated 7 years ago
- Corruption Perceptions Index - CPI☆18Updated 5 months ago
- Open Knowledge coding standards and style guide.☆35Updated 5 years ago
- Scan a folder of document files of all types and extract the text into a CSV suitable for Overview☆26Updated 9 years ago
- Utilities for retrieving whitehouse.gov transcripts and matching news quotes to them☆15Updated 10 years ago
- Capstone GRS Website☆7Updated 5 years ago
- Python utilities to make it a little easier to set up and run a Twitter bot☆40Updated last year