measuresforjustice / textricatorLinks
Textricator is a tool to extract text from documents and generate structured data.
☆351Updated 10 months ago
Alternatives and similar repositories for textricator
Users that are interested in textricator are comparing it to the libraries listed below
Sorting:
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆276Updated 3 years ago
- A cross-platform command line tool for parallelised content extraction and analysis.☆249Updated 3 months ago
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆198Updated 8 months ago
- A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!☆303Updated 8 months ago
- A simple OpenRefine reconciliation service that runs on top of a CSV file☆125Updated 10 years ago
- Data Curator - share usable open data☆281Updated 4 years ago
- ☆117Updated last week
- A framework for creating web-based knowledge maps☆211Updated this week
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆406Updated last year
- web interface for recoll desktop search☆292Updated 5 years ago
- Ocular is a state-of-the-art historical OCR system.☆266Updated last year
- Shell script to run OpenRefine in batch mode (import, transform, export). It orchestrates OpenRefine (server) and a python client that co…☆87Updated last year
- Content ExtRactor and MINEr☆510Updated 3 years ago
- Run Overview on your own system☆131Updated 4 years ago
- A set of tools to allow PDF to XML conversion, utilising Apache Beam and other tools. The aim of this project is to bring multiple tools…☆296Updated last month
- PDF to XML ALTO file converter☆259Updated last week
- Ergonomic line-by-line transcription of scanned text.☆54Updated 5 years ago
- ☆210Updated 4 years ago
- Social Feed Manager user interface application.☆156Updated last year
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆328Updated 2 years ago
- The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.☆152Updated last month
- Extract tables from PDF pages.☆298Updated 5 years ago
- An online annotation platform for teaching and learning in the humanities.☆108Updated 2 months ago
- A post-processing tool for scanned sheets of paper.☆85Updated last year
- Lightweight web scraping toolkit for documents and structured data.☆315Updated 2 years ago
- Create beautiful documents with data. Open source pdf (and Scribus) template and mail-merge alternative.☆279Updated 2 weeks ago
- Python script to do PDF OCR conversion using Tesseract☆375Updated 2 years ago
- Apache Annotator provides annotation enabling code for browsers, servers, and humans.☆241Updated last year
- The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the co…☆86Updated 4 years ago
- Computer Assisted Text Markup and Analysis☆96Updated this week