HarshUpadhyay / TesseractTrainerLinks
A small framework taking over the manual training process described in the Tesseract3 Wiki: https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
☆132Updated 2 years ago
Alternatives and similar repositories for TesseractTrainer
Users that are interested in TesseractTrainer are comparing it to the libraries listed below
Sorting:
- An expandable and scalable OCR pipeline☆87Updated 7 years ago
- ☆129Updated 8 years ago
- Python wrapper for the tesseract OCR engine. The module is based on OpenCV☆177Updated 8 years ago
- A library for extracting tables from PDF files☆92Updated 5 years ago
- Sometimes sites make crawling hard. Selenium-crawler uses selenium automation to fix that.☆125Updated 12 years ago
- A simple viewer and inspection tool for text boxes in PDF documents☆95Updated 3 years ago
- A simple python OCR engine using opencv☆531Updated last year
- Distributed text analysis suite based on Celery☆96Updated 2 years ago
- Mapping photos of Old New York☆291Updated 9 months ago
- Next generation OCR engine based on LSTMs.☆52Updated 7 years ago
- Various documents related to Tesseract OCR☆266Updated 4 years ago
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- Extract tables from PDF pages.☆296Updated 5 years ago
- Source code of demo app for image comparison☆74Updated 9 years ago
- Attempts to determine the natural language of a selection of Unicode (utf-8) text (a clone of http://code.google.com/p/guess-language wit…☆48Updated 15 years ago
- End to end OCR system for Telugu. Based on Convolutional Neural Networks.☆50Updated last month
- Detect text with stroke width transform.☆43Updated 11 years ago
- Extract meaningful content from pdf and psd file, such as texts and images both linked into a common JSON string☆36Updated 7 years ago
- A more complete example of programming with PDFMiner, which continues where the default documentation stops☆216Updated 5 years ago
- A Python Perceptual Image Hashing Module☆214Updated 3 years ago
- Breaking captchas using torch☆181Updated 9 years ago
- An elementary captcha decoder written in python☆156Updated 10 years ago
- An efficient simhash implementation for python☆126Updated 5 years ago
- Crack number and Chinese captcha with both traditional and deep learning methods, based on Torch and python.☆34Updated 9 years ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆96Updated 7 years ago
- Automatically extracts and normalizes an online article or blog post publication date☆117Updated 2 years ago
- Small set of utilities to simplify writing Scrapy spiders.☆49Updated 10 years ago
- Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.☆108Updated 5 months ago
- Training/test data for Dragnet☆41Updated 10 years ago
- Tool to extract news articles from newspaper and give the context about the news☆211Updated 8 years ago