HarshUpadhyay / TesseractTrainer
A small framework taking over the manual training process described in the Tesseract3 Wiki: https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
☆131Updated 2 years ago
Alternatives and similar repositories for TesseractTrainer:
Users that are interested in TesseractTrainer are comparing it to the libraries listed below
- Distributed text analysis suite based on Celery☆95Updated 2 years ago
- Python wrapper for the tesseract OCR engine. The module is based on OpenCV☆177Updated 7 years ago
- Next generation OCR engine based on LSTMs.☆52Updated 7 years ago
- An expandable and scalable OCR pipeline☆87Updated 7 years ago
- A simple program to extract the text from an image before performing OCR☆222Updated 5 years ago
- Pipeline for distributed Natural Language Processing, made in Python☆64Updated 8 years ago
- the Chinese NLP full stack toolkit☆41Updated 10 years ago
- Attempts to determine the natural language of a selection of Unicode (utf-8) text (a clone of http://code.google.com/p/guess-language wit…☆48Updated 15 years ago
- [NO LONGER MAINTAINED AS OPEN SOURCE - USE SCALETEXT.COM INSTEAD]☆108Updated 11 years ago
- A library for extracting tables from PDF files☆89Updated 4 years ago
- PDF Extraction Toolkit☆41Updated 4 years ago
- Detect text with stroke width transform.☆44Updated 10 years ago
- Python API for Various DB-Backed Simhash Clusters☆64Updated 8 years ago
- Presentations, tutorials and data for the OCR workshop at LMU☆17Updated 7 years ago
- A cluster implementation of simhash near-duplicate detection☆32Updated 10 years ago
- A web-based editor for Tesseract box files☆27Updated 10 years ago
- Search engine base (crawler, indexer and parser) using Python, Celery, RabbitMQ, CouchDB and Whoosh.☆11Updated last year
- 'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.☆22Updated 7 years ago
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- Non-Overlapping Aho-Corasick Python extension, for Python 2 (str and unicode) and Python 3☆51Updated 9 years ago
- Crawlera tools☆26Updated 9 years ago
- A python implementation of DEPTA☆83Updated 8 years ago
- Small set of utilities to simplify writing Scrapy spiders.☆49Updated 9 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year
- Demo using image_features api to sort images based on similarity.☆30Updated 9 years ago
- Fast Python Bloom Filter using Mmap☆13Updated 12 years ago
- Simple Web UI for Scrapy spider management via Scrapyd☆51Updated 6 years ago
- A tool for semantic relation extraction. The program finds pairs of semantically related words based on the text definitions coming from …☆26Updated 10 years ago
- Extensions for using Scrapy on Amazon AWS☆32Updated 12 years ago
- ☆50Updated 3 years ago