gojiplus / image-to-textLinks
Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs
☆15Updated 5 years ago
Alternatives and similar repositories for image-to-text
Users that are interested in image-to-text are comparing it to the libraries listed below
Sorting:
- Monitor datasets, gets alerts when something happens☆210Updated 6 years ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 8 years ago
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆24Updated 8 years ago
- An online reference for data journalism☆25Updated 11 years ago
- (Python) Execute tesseract OCR on a multi-page PDF.☆18Updated 2 years ago
- Tools for tracking stories on news homepages☆48Updated 5 years ago
- stoplists for African languages generated from the ASP corpus☆14Updated 9 years ago
- Scraper built with Scrapy.☆18Updated 11 months ago
- Pure python script that takes user query and summarizes news related to it.☆25Updated 3 years ago
- Examples of bad data, especially from government.☆23Updated 11 months ago
- A place to collect and share knowledge about liberating data from PDFs☆54Updated 3 years ago
- Bot software for creating Wikipedia articles using geographical data☆10Updated 8 years ago
- Legislative data from the congress repository☆19Updated 11 years ago
- The OpenSextant Gazetteer is a collection of world-wide place name data☆12Updated 7 years ago
- files and code related to the Early Modern OCR Project (eMOP) at the IDHMC☆16Updated 10 years ago
- ☆10Updated 9 years ago
- Classifying the content of domains☆56Updated 2 years ago
- A Wikidata puzzle game☆19Updated 8 years ago
- Python library and command line tool for converting data from one format to another☆99Updated 5 years ago
- List of easy American-English words: The New Dale-Chall (1995)☆32Updated 2 years ago
- Want to learn more about Free Law Project technologies, policies and thinking? Get the literature here.☆23Updated 4 years ago
- A glossary for the United States.☆42Updated 10 years ago
- A scraper focused on organizational Github accounts and their members.☆42Updated 2 years ago
- Data Driven Journalism Handbook☆21Updated 12 years ago
- A LevelDB backed URL unshortening microservice written in JavaScript☆31Updated 2 years ago
- ☆18Updated 9 years ago
- Extract images from PDF documents. Works on multiple and single PDF files☆14Updated 8 years ago
- JSON schemas for OpenCorporates data☆20Updated 2 months ago
- Date parsing and normalization utilities for Python.☆22Updated last year
- Global Data Journalists Directory☆10Updated 6 years ago