measuresforjustice / textricatorLinks
Textricator is a tool to extract text from documents and generate structured data.
☆350Updated 7 months ago
Alternatives and similar repositories for textricator
Users that are interested in textricator are comparing it to the libraries listed below
Sorting:
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆275Updated 3 years ago
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆195Updated 5 months ago
- Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Tex…☆1,090Updated 6 months ago
- PDF to XML ALTO file converter☆254Updated last month
- ☆210Updated 4 years ago
- ☆109Updated last month
- Ergonomic line-by-line transcription of scanned text.☆54Updated 4 years ago
- Ocular is a state-of-the-art historical OCR system.☆265Updated last year
- Shell script to run OpenRefine in batch mode (import, transform, export). It orchestrates OpenRefine (server) and a python client that co…☆86Updated last year
- A cross-platform command line tool for parallelised content extraction and analysis.☆247Updated 2 weeks ago
- A set of tools to allow PDF to XML conversion, utilising Apache Beam and other tools. The aim of this project is to bring multiple tools…☆294Updated 3 years ago
- Data Curator - share usable open data☆276Updated 3 years ago
- A simple OpenRefine reconciliation service that runs on top of a CSV file☆124Updated 10 years ago
- Lightweight web scraping toolkit for documents and structured data.☆314Updated last year
- A framework for creating web-based knowledge maps☆208Updated this week
- The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.☆148Updated last year
- An online annotation platform for teaching and learning in the humanities.☆108Updated 2 months ago
- Scripts and results from our OCR roundup, available on Source☆150Updated 6 years ago
- Web application for management formal representations of knowledge, like controlled vocabularies, taxonomies, thesauri and glossaries☆136Updated last month
- OASIS TC Open Repository: Schema files, examples, exemplificative implementations and libraries, and documentation related to the LegalDo…☆72Updated 3 years ago
- Web based JavaScript GUI library for proofreading/editing hOCR☆100Updated 7 years ago
- ☆27Updated last month
- Social Feed Manager user interface application.☆156Updated last year
- Run Overview on your own system☆126Updated 4 years ago
- ☆98Updated 4 years ago
- Extract tables from PDF pages.☆298Updated 5 years ago
- Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.☆240Updated 2 weeks ago
- Computer Assisted Text Markup and Analysis☆94Updated 2 weeks ago
- LA-PDFText is a system for extracting accurate text from PDF-based research articles (and an interface to be able to improve performance …☆81Updated 7 years ago
- The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the co…☆86Updated 3 years ago