measuresforjustice / textricatorLinks
Textricator is a tool to extract text from documents and generate structured data.
☆350Updated 9 months ago
Alternatives and similar repositories for textricator
Users that are interested in textricator are comparing it to the libraries listed below
Sorting:
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆276Updated 3 years ago
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆197Updated 7 months ago
- Ergonomic line-by-line transcription of scanned text.☆54Updated 5 years ago
- A cross-platform command line tool for parallelised content extraction and analysis.☆249Updated 2 months ago
- Ocular is a state-of-the-art historical OCR system.☆266Updated last year
- Web based JavaScript GUI library for proofreading/editing hOCR☆101Updated 7 years ago
- Apache Annotator provides annotation enabling code for browsers, servers, and humans.☆240Updated last year
- A set of tools to allow PDF to XML conversion, utilising Apache Beam and other tools. The aim of this project is to bring multiple tools…☆296Updated last week
- Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Tex…☆1,118Updated 8 months ago
- An online annotation platform for teaching and learning in the humanities.☆108Updated last month
- OASIS TC Open Repository: Schema files, examples, exemplificative implementations and libraries, and documentation related to the LegalDo…☆75Updated 3 years ago
- A framework for creating web-based knowledge maps☆211Updated this week
- An expandable and scalable OCR pipeline☆89Updated 8 years ago
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆329Updated 2 years ago
- A fast and friendly PDF scraping library.☆783Updated 2 years ago
- Industry supported, open source PDF/A validation library☆313Updated last week
- Python script to do PDF OCR conversion using Tesseract☆376Updated 2 years ago
- LexPredict ContraxSuite☆177Updated 2 years ago
- Extract tables from PDF pages.☆298Updated 5 years ago
- Run Overview on your own system☆130Updated 4 years ago
- Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format☆46Updated 9 months ago
- A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR …☆67Updated 2 years ago
- A post-processing tool for scanned sheets of paper.☆1,140Updated last year
- Data Curator - share usable open data☆279Updated 4 years ago
- The hOCR Embedded OCR Workflow and Output Format☆75Updated last year
- Find legal citations in any block of text☆198Updated 3 months ago
- ALTO XML schema - latest and all former versions☆55Updated last month
- Working with hOCR in Javascript☆136Updated 2 years ago
- A simple OpenRefine reconciliation service that runs on top of a CSV file☆125Updated 10 years ago
- A database of court reporters, tests and other experiments☆118Updated last month