measuresforjustice / textricator
Textricator is a tool to extract text from documents and generate structured data.
☆347Updated last month
Alternatives and similar repositories for textricator:
Users that are interested in textricator are comparing it to the libraries listed below
- Ergonomic line-by-line transcription of scanned text.☆51Updated 4 years ago
- Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Tex…☆1,025Updated this week
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆268Updated 2 years ago
- Casebox: Secure all your information and team communication in one place☆218Updated 5 years ago
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆188Updated 2 months ago
- LA-PDFText is a system for extracting accurate text from PDF-based research articles (and an interface to be able to improve performance …☆82Updated 7 years ago
- ☆98Updated 3 years ago
- ☆92Updated last week
- PDF to XML ALTO file converter☆237Updated last week
- Lightweight web scraping toolkit for documents and structured data.☆311Updated last year
- 🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based☆314Updated last year
- Shell script to run OpenRefine in batch mode (import, transform, export). It orchestrates OpenRefine (server) and a python client that co…☆83Updated 10 months ago
- Data Curator - share usable open data☆271Updated 3 years ago
- Python binding to libpoppler with focus on text extraction☆97Updated 3 years ago
- The hOCR Embedded OCR Workflow and Output Format☆74Updated 8 months ago
- batch Optical Mark Recognition without foresight☆39Updated last year
- LexPredict ContraxSuite☆168Updated 2 years ago
- Industry supported, open source PDF/A validation library☆289Updated this week
- Social Feed Manager user interface application.☆155Updated 10 months ago
- A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.☆442Updated last year
- A place to collect and share knowledge about liberating data from PDFs☆54Updated 3 years ago
- Web application for management formal representations of knowledge, like controlled vocabularies, taxonomies, thesauri and glossaries☆130Updated 2 months ago
- Websites crawler with built-in exploration and control web interface☆350Updated 2 months ago
- Booktype is a free, open source platform that produces beautiful, engaging books formatted for print, Amazon, iBooks and almost any eread…☆934Updated 2 years ago
- An eBook Framework (CSS + template)☆217Updated 4 years ago
- Ocular is a state-of-the-art historical OCR system.☆262Updated 10 months ago
- ☆209Updated 3 years ago
- Publishing Framework for Large-Scale Data-Rich Interactive Web Pages☆178Updated 3 years ago
- PDF.js + Hypothesis viewer / annotator☆387Updated 2 months ago
- Create and share elegant timelines and timemaps fast☆280Updated 3 years ago