gojiplus / image-to-textLinks
Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs
☆16Updated 6 years ago
Alternatives and similar repositories for image-to-text
Users that are interested in image-to-text are comparing it to the libraries listed below
Sorting:
- Tools for tracking stories on news homepages☆48Updated 6 years ago
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆25Updated 9 years ago
- Monitor datasets, gets alerts when something happens☆210Updated 7 years ago
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Updated 11 years ago
- Focused Crawler for VT's CTRNet☆10Updated 12 years ago
- ☆36Updated 2 years ago
- A repository of materials for a proposed class on automated story bots.☆49Updated 7 years ago
- Scraper built with Scrapy.☆18Updated last year
- Near-duplicate detection tool☆24Updated 9 years ago
- Python natural language processing work☆29Updated 16 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year
- Pure python script that takes user query and summarizes news related to it.☆25Updated 3 years ago
- Part of eMOP: Franken+ tool for creating font training for Tesseract OCR engine from page images.☆24Updated 10 years ago
- Command-line tool to extract a ranked list of relevant keywords from a corpus with the option of using either topic modeling or tf-idf sc…☆41Updated 8 years ago
- Open Source Social Media Monitoring And Engagement System Core/API☆37Updated 11 years ago
- Python library and command line tool for converting data from one format to another☆99Updated 5 years ago
- A platform for collecting, analyzing, and visualizing social media data.☆12Updated 5 years ago
- A place to collect and share knowledge about liberating data from PDFs☆55Updated 3 years ago
- Data Mining Historical Newspaper Metadata (METS/ALTO formats)☆25Updated 3 years ago
- A queue-controlled browser automation tool for improving web crawl quality☆64Updated 4 months ago
- Contains the implementation of algorithms that estimate the geographic location of media content based on their content and metadata. It …☆15Updated 9 years ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 9 years ago
- LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilatio…☆69Updated 2 years ago
- Wrapper to pocketsphinx phoneme labeling tools☆18Updated 9 years ago
- Date parsing and normalization utilities for Python.☆22Updated 2 years ago
- Blog crawler for the blogforever project.☆23Updated 11 years ago
- Watching the SCOTUS☆179Updated 10 years ago
- NWJS os x desktop based application that given a video/audio file returns a transcription using IBM Watson Speech to text API☆41Updated 8 years ago
- bigram / trigram analysis of wikipedia; mainly mutual info☆22Updated 13 years ago
- Responsively embed DocumentCloud pages.☆22Updated 7 years ago