measuresforjustice / textricatorLinks
Textricator is a tool to extract text from documents and generate structured data.
☆348Updated 5 months ago
Alternatives and similar repositories for textricator
Users that are interested in textricator are comparing it to the libraries listed below
Sorting:
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆271Updated 2 years ago
- A cross-platform command line tool for parallelised content extraction and analysis.☆248Updated last month
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆195Updated 3 months ago
- Scripts and results from our OCR roundup, available on Source☆150Updated 6 years ago
- A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!☆296Updated 3 months ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆395Updated last year
- Ergonomic line-by-line transcription of scanned text.☆53Updated 4 years ago
- ☆104Updated 2 weeks ago
- Shell script to run OpenRefine in batch mode (import, transform, export). It orchestrates OpenRefine (server) and a python client that co…☆86Updated last year
- Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Tex…☆1,070Updated 4 months ago
- ☆211Updated 4 years ago
- Data Curator - share usable open data☆276Updated 3 years ago
- Run Overview on your own system☆126Updated 4 years ago
- An online annotation platform for teaching and learning in the humanities.☆108Updated last week
- A simple OpenRefine reconciliation service that runs on top of a CSV file☆121Updated 10 years ago
- Web based JavaScript GUI library for proofreading/editing hOCR☆95Updated 6 years ago
- LexPredict ContraxSuite☆173Updated 2 years ago
- Ocular is a state-of-the-art historical OCR system.☆264Updated last year
- LA-PDFText is a system for extracting accurate text from PDF-based research articles (and an interface to be able to improve performance …☆81Updated 7 years ago
- A post-processing tool for scanned sheets of paper.☆1,102Updated last year
- Web application for management formal representations of knowledge, like controlled vocabularies, taxonomies, thesauri and glossaries☆135Updated last week
- PDF to XML ALTO file converter☆252Updated 3 weeks ago
- The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the co…☆85Updated 3 years ago
- Create beautiful documents with data. Open source pdf (and Scribus) template and mail-merge alternative.☆269Updated 8 months ago
- The hOCR Embedded OCR Workflow and Output Format☆74Updated last year
- OASIS TC Open Repository: Schema files, examples, exemplificative implementations and libraries, and documentation related to the LegalDo…☆69Updated 3 years ago
- Documentation and project-wide issues for the Website Monitoring project (a.k.a. "Scanner")☆108Updated 6 months ago
- A high performance bibliographic information service: https://biblio-glutton.readthedocs.io☆143Updated 2 months ago
- A self‑hosted search engine for documents. Help us improve Datashare by answering a survey on structured content: https://forms.gle/PYgus…☆653Updated this week
- Find legal citations in any block of text☆169Updated 2 months ago