measuresforjustice / textricator
Textricator is a tool to extract text from documents and generate structured data.
☆347Updated 3 months ago
Alternatives and similar repositories for textricator:
Users that are interested in textricator are comparing it to the libraries listed below
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆185Updated last week
- A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!☆280Updated last year
- An online annotation platform for teaching and learning in the humanities.☆107Updated this week
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆266Updated 2 years ago
- Run Overview on your own system☆123Updated 3 years ago
- Ergonomic line-by-line transcription of scanned text.☆50Updated 4 years ago
- The hOCR Embedded OCR Workflow and Output Format☆74Updated 6 months ago
- PAGE XML format collection for document image page content and more☆67Updated 3 years ago
- ☆209Updated 3 years ago
- PDF to XML ALTO file converter☆223Updated last month
- A post-processing tool for scanned sheets of paper.☆78Updated 11 months ago
- Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Tex…☆997Updated last year
- A simple viewer and inspection tool for text boxes in PDF documents☆94Updated 2 years ago
- Web application for management formal representations of knowledge, like controlled vocabularies, taxonomies, thesauri and glossaries☆127Updated 3 weeks ago
- ☆98Updated 3 years ago
- Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format☆46Updated 10 months ago
- An expandable and scalable OCR pipeline☆87Updated 7 years ago
- OASIS TC Open Repository: Schema files, examples, exemplificative implementations and libraries, and documentation related to the LegalDo…☆62Updated 2 years ago
- OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched☆260Updated 9 years ago
- Ocular is a state-of-the-art historical OCR system.☆258Updated 8 months ago
- Social Feed Manager user interface application.☆155Updated 7 months ago
- batch Optical Mark Recognition without foresight☆39Updated 10 months ago
- ☆84Updated this week
- LA-PDFText is a system for extracting accurate text from PDF-based research articles (and an interface to be able to improve performance …☆82Updated 6 years ago
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆311Updated last year
- Scripts that clean up OCR and munge Hathi metadata.☆76Updated 7 years ago
- A set of tools to allow PDF to XML conversion, utilising Apache Beam and other tools. The aim of this project is to bring multiple tools…☆293Updated 2 years ago
- An application that brings humanities research methods to data visualization.☆175Updated 4 years ago
- Computer Assisted Text Markup and Analysis☆92Updated this week
- Palladio Application☆40Updated 3 years ago