gojiplus / image-to-textLinks
Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs
☆15Updated 6 years ago
Alternatives and similar repositories for image-to-text
Users that are interested in image-to-text are comparing it to the libraries listed below
Sorting:
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆25Updated 9 years ago
- Monitor datasets, gets alerts when something happens☆210Updated 6 years ago
- Quill Grammar App☆11Updated 7 years ago
- Classifying the content of domains☆57Updated 2 months ago
- A place to collect and share knowledge about liberating data from PDFs☆55Updated 3 years ago
- ScraperWiki Python library for scraping and saving data☆158Updated 2 years ago
- Command-line tool to extract a ranked list of relevant keywords from a corpus with the option of using either topic modeling or tf-idf sc…☆40Updated 8 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year
- Tools for tracking stories on news homepages☆48Updated 6 years ago
- Visualize geo-located tweets in real time, parse them, use them to write bot-assisted poetic-text, then ship that text to people within c…☆13Updated 8 years ago
- Python scraper to get weekly CDC flu surveillance data☆25Updated 10 years ago
- Python library and command line tool for converting data from one format to another☆99Updated 5 years ago
- R client for the Virustotal Public API. Virustotal is a Google service that analyzes files and URLs for viruses etc.☆12Updated 2 months ago
- (Python) Execute tesseract OCR on a multi-page PDF.☆19Updated 2 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆63Updated 3 months ago
- Focused Crawler for VT's CTRNet☆10Updated 12 years ago
- Uses NLP methods to parse and classify contracts from The City of New Orleans☆10Updated 10 years ago
- Ruby script to download bulk results from Archive.org's TV News database of closed captions☆14Updated 12 years ago
- A repository of materials for a proposed class on automated story bots.☆49Updated 7 years ago
- Compare coverage across different media sources using the Juicer☆12Updated 9 years ago
- bigram / trigram analysis of wikipedia; mainly mutual info☆22Updated 13 years ago
- ☆36Updated 2 years ago
- Investigative tool for extracting relevant areas from many documents☆14Updated 10 years ago
- Near-duplicate detection tool☆24Updated 8 years ago
- The Face-o-Matic 2000 finds known faces on TV☆19Updated 7 years ago
- "Old SFM" -- manage rules and streams from social data sources, starting with twitter.☆86Updated 2 years ago
- 70+ years of data on Network television. Attributes of shows, race & gender of cast members, directors, producers, presenters, etc.☆10Updated last year
- Workshop bringing together individuals interested in developing curriculum, workflows, and tools to strengthen reproducibility in researc…☆33Updated 10 years ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆96Updated 7 years ago
- JavaScript based graph visualization library with emphasis on customization and modularity.☆13Updated 6 years ago