gojiplus / image-to-text
Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs
☆15Updated 5 years ago
Alternatives and similar repositories for image-to-text
Users that are interested in image-to-text are comparing it to the libraries listed below
Sorting:
- Pure python script that takes user query and summarizes news related to it.☆25Updated 2 years ago
- Legislative data from the congress repository☆19Updated 11 years ago
- Want to learn more about Free Law Project technologies, policies and thinking? Get the literature here.☆23Updated 3 years ago
- Scraper built with Scrapy.☆17Updated 9 months ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 8 years ago
- Examples of bad data, especially from government.☆23Updated 9 months ago
- Collections of english historical texts and data relating to them☆18Updated 4 years ago
- An online reference for data journalism☆25Updated 11 years ago
- A little artoo.js bookmarklet to scrape and download the wanted or missing person lists from Interpol.☆12Updated 10 years ago
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆24Updated 8 years ago
- An App Engine app that generates OPMLs from spreadsheets.☆12Updated 13 years ago
- Ready or Not...☆50Updated 7 years ago
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Updated 11 years ago
- (Python) Execute tesseract OCR on a multi-page PDF.☆18Updated last year
- Service for creating Twitter datasets for research and archiving.☆26Updated 2 years ago
- The OpenSextant Gazetteer is a collection of world-wide place name data☆12Updated 7 years ago
- Data Mining Historical Newspaper Metadata (METS/ALTO formats)☆25Updated 2 years ago
- 🎞 transcribe > annotate > remix > publish video and audio content☆21Updated 11 months ago
- Scripts for Internet Archive☆13Updated last month
- files and code related to the Early Modern OCR Project (eMOP) at the IDHMC☆16Updated 10 years ago
- ☆35Updated last year
- utility to fetch provenance information from Internet Archive's Wayback Machine☆13Updated 2 years ago
- A LevelDB backed URL unshortening microservice written in JavaScript☆31Updated 2 years ago
- A library and command-line tool for fetching Facebook Pages' published posts.☆13Updated 7 years ago
- Tools for tracking stories on news homepages☆48Updated 5 years ago
- List of FIPS (Federal Information Processing Standards) region codes☆10Updated 4 months ago
- Responsively embed DocumentCloud notes.☆21Updated 6 years ago
- Google Refine extension for adding columns (extending data) from DBpedia☆39Updated 11 years ago
- Global Data Journalists Directory☆10Updated 6 years ago
- A design prototype for DocNow to learn with☆14Updated 8 years ago