measuresforjustice / textricator
Textricator is a tool to extract text from documents and generate structured data.
☆347Updated 2 weeks ago
Alternatives and similar repositories for textricator:
Users that are interested in textricator are comparing it to the libraries listed below
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆389Updated 7 months ago
- Scripts and results from our OCR roundup, available on Source☆150Updated 6 years ago
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆187Updated last month
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆266Updated 2 years ago
- A post-processing tool for scanned sheets of paper.☆1,066Updated 8 months ago
- ☆98Updated 3 years ago
- PDF2JSON is a conversion library based on XPDF (3.02) which can be used for high performance PDF page by page conversion to JSON and XML …☆306Updated 4 years ago
- batch Optical Mark Recognition without foresight☆39Updated last year
- PDF to XML ALTO file converter☆234Updated this week
- Ergonomic line-by-line transcription of scanned text.☆51Updated 4 years ago
- The hOCR Embedded OCR Workflow and Output Format☆74Updated 7 months ago
- ☆209Updated 3 years ago
- The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the co…☆83Updated 3 years ago
- A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR …☆65Updated last year
- Python interface to Apache PDFBox command-line tools.☆75Updated 2 years ago
- Ocular is a state-of-the-art historical OCR system.