measuresforjustice / textricator
Textricator is a tool to extract text from documents and generate structured data.
☆346Updated 2 months ago
Alternatives and similar repositories for textricator
Users that are interested in textricator are comparing it to the libraries listed below
Sorting:
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆268Updated 2 years ago
- A cross-platform command line tool for parallelised content extraction and analysis.☆245Updated this week
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆188Updated last week
- Scripts and results from our OCR roundup, available on Source☆150Updated 6 years ago
- TopicDB is a topic maps-based semantic graph store (using SQLite for persistence)☆261Updated 4 months ago
- ☆98Updated 3 years ago
- Lightweight web scraping toolkit for documents and structured data.☆310Updated last year
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆391Updated 9 months ago
- The hOCR Embedded OCR Workflow and Output Format☆74Updated 9 months ago
- Working with hOCR in Javascript☆127Updated 2 years ago
- A scraping command line tool for the modern web☆260Updated 8 years ago
- File validation and characterisation.☆179Updated 2 weeks ago
- Web based JavaScript GUI library for proofreading/editing hOCR☆95Updated 6 years ago
- tool for collectively summarizing large discussions☆143Updated 2 years ago
- 📚 A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivity☆95Updated 6 years ago
- ☆92Updated this week
- A set of tools to allow PDF to XML conversion, utilising Apache Beam and other tools. The aim of this project is to bring multiple tools…☆295Updated 3 years ago
- Shell script to run OpenRefine in batch mode (import, transform, export). It orchestrates OpenRefine (server) and a python client that co…☆85Updated 11 months ago
- Social Feed Manager user interface application.☆155Updated 10 months ago
- The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the co…☆84Updated 3 years ago
- An application that brings humanities research methods to data visualization.☆178Updated 4 years ago
- Ergonomic line-by-line transcription of scanned text.☆51Updated 4 years ago
- Palladio Application☆41Updated 3 years ago
- A post-processing tool for scanned sheets of paper.☆1,074Updated 10 months ago
- OpenRefine reconciliation services for VIAF, ORCID, and Open Library + framework for creating more.☆118Updated 2 months ago
- A self-hosted search engine for documents.☆630Updated this week
- Docverter Server☆832Updated 8 years ago
- PDF to XML ALTO file converter☆238Updated this week
- A push-button Digital Humanities laboratory.☆126Updated 6 years ago
- An online annotation platform for teaching and learning in the humanities.☆108Updated 3 months ago