gojiplus / image-to-text
Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs
☆15Updated 5 years ago
Alternatives and similar repositories for image-to-text:
Users that are interested in image-to-text are comparing it to the libraries listed below
- (Python) Execute tesseract OCR on a multi-page PDF.☆18Updated last year
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 8 years ago
- files and code related to the Early Modern OCR Project (eMOP) at the IDHMC☆16Updated 10 years ago
- Extract images from PDF documents. Works on multiple and single PDF files☆14Updated 7 years ago
- Getting, analysing and displaying lists of papers☆15Updated 6 months ago
- Ruby script to download bulk results from Archive.org's TV News database of closed captions☆14Updated 12 years ago
- Monitor datasets, gets alerts when something happens☆210Updated 6 years ago
- Data Mining Historical Newspaper Metadata (METS/ALTO formats)☆25Updated 2 years ago
- Collections of english historical texts and data relating to them☆18Updated 4 years ago
- Elasticsearch like search engine supporting real time indexing and querying☆15Updated 7 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆16Updated 9 years ago
- A scraper focused on organizational Github accounts and their members.☆42Updated 2 years ago
- GenderTracker is a service that decomposes articles and computes various gender-related metrics based on the content.☆25Updated 11 years ago
- Query Wikipedia articles☆18Updated 2 years ago
- A small Docker built for the OCRopus OCR system.☆20Updated 7 years ago
- A selection of test lines of several early printed books as well as the corresponding individual OCRopus models and mixed models.☆10Updated 7 years ago
- Scripts for Internet Archive☆13Updated last month
- Colors in Library of Congress digital images.☆32Updated 7 years ago
- Scraper built with Scrapy.☆17Updated 8 months ago
- Tools for tracking stories on news homepages☆48Updated 5 years ago
- Data collection for Airbnb business☆13Updated 10 years ago
- learning related projects☆17Updated 10 years ago
- Google Refine extension for adding columns (extending data) from DBpedia☆39Updated 11 years ago
- Part of eMOP: Franken+ tool for creating font training for Tesseract OCR engine from page images.☆24Updated 9 years ago
- Converters for various file formats used for representing OCR☆12Updated 2 weeks ago
- Automatic alignment of books between HathiTrust, Internet Archive, Google Books, etc.☆35Updated this week
- Command-line tool to extract a ranked list of relevant keywords from a corpus with the option of using either topic modeling or tf-idf sc…☆40Updated 8 years ago
- This is a crawler for crawling papers from google scholar (http://scholar.google.com). Credits for this code goes to (https://github.com/…☆11Updated 8 years ago
- Responsively embed DocumentCloud notes.☆21Updated 6 years ago
- 🎞 transcribe > annotate > remix > publish video and audio content☆21Updated 11 months ago