measuresforjustice / textricatorLinks
Textricator is a tool to extract text from documents and generate structured data.
☆345Updated 2 months ago
Alternatives and similar repositories for textricator
Users that are interested in textricator are comparing it to the libraries listed below
Sorting:
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆188Updated 2 weeks ago
- A cross-platform command line tool for parallelised content extraction and analysis.☆245Updated last week
- The hOCR Embedded OCR Workflow and Output Format☆74Updated 9 months ago
- Ocular is a state-of-the-art historical OCR system.☆262Updated last year
- The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the co …☆84Updated 3 years ago
- Ergonomic line-by-line transcription of scanned text.☆51Updated 4 years ago
- batch Optical Mark Recognition without foresight☆39Updated last year
- ☆95Updated last week
- Extract tables from PDF files☆357Updated 9 years ago
- ☆209Updated 3 years ago
- Web based JavaScript GUI library for proofreading/editing hOCR☆95Updated 6 years ago
- Shell script to run OpenRefine in batch mode (import, transform, export). It orchestrates OpenRefine (server) and a python client that co…☆85Updated 11 months ago
- A set of tools to allow PDF to XML conversion, utilising Apache Beam and other tools. The aim of this project is to bring multiple tools…☆294Updated 3 years ago
- An expandable and scalable OCR pipeline☆87Updated 7 years ago
- LexPredict ContraxSuite☆170Updated 2 years ago
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆268Updated 2 years ago
- ☆98Updated 3 years ago
- Lightweight web scraping toolkit for documents and structured data.☆310Updated last year
- Run Overview on your own system☆124Updated 3 years ago
- Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Tex…☆1,040Updated last month
- A framework for creating web-based knowledge maps☆204Updated this week
- PDF to XML ALTO file converter☆240Updated last week
- track changes to the news, where news is anything with an RSS feed☆178Updated 4 years ago
- Publishing Framework for Large-Scale Data-Rich Interactive Web Pages☆178Updated 3 years ago
- Extract tables from PDF pages.☆291Updated 4 years ago
- Python binding to libpoppler with focus on text extraction☆97Updated 3 years ago
- OpenRefine reconciliation services for VIAF, ORCID, and Open Library + framework for creating more.☆119Updated 3 months ago
- Conversions between various OCR formats☆78Updated 2 years ago
- Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources☆210Updated this week
- An online annotation platform for teaching and learning in the humanities.☆108Updated 3 months ago