danvk / boxeditLinks
A web-based editor for Tesseract box files
☆27Updated 10 years ago
Alternatives and similar repositories for boxedit
Users that are interested in boxedit are comparing it to the libraries listed below
Sorting:
- Next generation OCR engine based on LSTMs.☆52Updated 7 years ago
- An expandable and scalable OCR pipeline☆87Updated 7 years ago
- A node.js library for extracting data from scanned forms.☆117Updated 2 years ago
- Tooling to extract data from scanned paper forms OCR-ed by Tesseract using the HOCR standard.☆84Updated 9 years ago
- Exploring extracting tables from a PDF to CSV using PDF.JS☆104Updated 8 years ago
- REST endpoint for Tabula☆25Updated 6 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆39Updated last year
- official diybookscanner repository☆39Updated 11 years ago
- Structured Data from PDF image-based files☆88Updated 12 years ago
- A small Docker built for the OCRopus OCR system.☆20Updated 7 years ago
- Tools for working with Optical Character Recognition output☆16Updated 11 years ago
- Recognition Models for Kraken and CLSTM☆14Updated 5 years ago
- Stencila for Python☆17Updated 6 years ago
- Tool for visualizing hOCR output from Tesseract (or other OCR engines that support hOCR).☆24Updated 10 years ago
- A place to collect and share knowledge about liberating data from PDFs☆54Updated 3 years ago
- Create an ERD for a database given as JSON-table-schema☆11Updated 9 years ago
- 'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.☆22Updated 7 years ago
- A small framework taking over the manual training process described in the Tesseract3 Wiki: https://code.google.com/p/tesseract-ocr/wiki/…☆131Updated 2 years ago
- Google Refine extension for adding columns (extending data) from DBpedia☆39Updated 11 years ago
- Python bindings for Neo4j☆27Updated 10 years ago
- Gamera 3 for Python 2 (deprecated)☆39Updated 2 years ago
- PDF Extraction Toolkit☆41Updated 4 years ago
- PDF Table Extractor - repository to hold revisable version of code from https://www.cvast.tuwien.ac.at/projects/pdf2table by Burcu Yildiz☆38Updated last year
- Binary Python bindings for poppler utils for content extraction☆42Updated 4 years ago
- Multi-Entity Extraction Framework for Academic Documents (with default extraction tools)☆31Updated last year
- See https://github.com/tworavens/tworavens for current repository for this project and http://2ra.vn for project pages.☆30Updated 6 years ago
- gzipstream allows Python to process multi-part gzip files from a streaming source☆23Updated 8 years ago
- Server endpoint for communicating with stanford-ner server☆25Updated 7 years ago
- Python binding to libpoppler with focus on text extraction☆97Updated 3 years ago
- FacetView is a pure javascript frontend for ElasticSearch.☆291Updated 10 years ago