Tooling to extract data from scanned paper forms OCR-ed by Tesseract using the HOCR standard.
☆83Mar 1, 2016Updated 10 years ago
Alternatives and similar repositories for whatwordwhere
Users that are interested in whatwordwhere are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- OCRopus model for Gothic print (Fraktur)☆19Feb 16, 2020Updated 6 years ago
- A collection of stemmers in Clojure☆21Jan 17, 2023Updated 3 years ago
- Tools for TICCL☆14Dec 12, 2025Updated 5 months ago
- An easy-to-use point-and-click geocoder 🌍📍☆15Jan 6, 2023Updated 3 years ago
- View HOCR files with Mirador☆29Sep 27, 2017Updated 8 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- blocks template☆18Mar 28, 2021Updated 5 years ago
- Course in Document and Content Analysis.☆14Apr 18, 2020Updated 6 years ago
- Notes for my talk "Exploring the Radio Spectrum for News"☆13Mar 6, 2020Updated 6 years ago
- Named Entity Recognition tool for Europeana Newspapers☆14Apr 5, 2018Updated 8 years ago
- Manuals, lexica, OCR test data for PoCoTo and the profiler☆15Jul 2, 2021Updated 4 years ago
- R tools for journalists☆18Mar 9, 2018Updated 8 years ago
- ☆25Mar 18, 2013Updated 13 years ago
- Code & supporting data behind Pioneer Press stories and interactives.☆14Jan 16, 2018Updated 8 years ago
- A Ruby parser for electronic candidate, PAC and party campaign filings from the Federal Election Commission.☆15Feb 3, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Versification mappings and versification snifffing☆20Aug 4, 2025Updated 9 months ago
- Deutsch Language Tool Kit☆12Aug 31, 2015Updated 10 years ago
- Natural language generation with hidden markov models (using hmmlearn)☆25Sep 24, 2016Updated 9 years ago
- Stand-alone implementation of UCD's IIIF image re-formatting tool + plugin to integrate with Mirador IIIF-compliant image viewer☆18Jul 31, 2017Updated 8 years ago
- An Editor for creating simple or complex OCR workflows☆17Jun 13, 2024Updated last year
- fork of tesseract for emscripten☆21Jul 21, 2015Updated 10 years ago
- Split a JSON file with hierarchical data to multiple CSV files☆28Apr 9, 2023Updated 3 years ago
- Guess a person's gender by their first name. Caveats apply.☆18May 6, 2023Updated 3 years ago
- 'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.☆23Feb 21, 2018Updated 8 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Efficient hOCR tooling☆57Aug 18, 2025Updated 9 months ago
- Investigative tool for extracting relevant areas from many documents☆14Nov 17, 2015Updated 10 years ago
- ☆13Jul 18, 2018Updated 7 years ago
- ☆25Apr 22, 2018Updated 8 years ago
- Presentations, tutorials and data for the OCR workshop at LMU☆16Jun 2, 2017Updated 8 years ago
- Ergonomic line-by-line transcription of scanned text.☆54Feb 2, 2026Updated 3 months ago
- Development version of ndlstm, multidimensional LSTMs for TensorFlow☆19Feb 20, 2018Updated 8 years ago
- Code for extracting data from a large number of PDFs, particularly FCC political ad documents☆15Oct 26, 2017Updated 8 years ago
- Next generation OCR engine based on LSTMs.☆51Apr 8, 2018Updated 8 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Some helpful bash profile functions for working with earth imagery☆33Mar 8, 2020Updated 6 years ago
- PDF table extraction☆10Dec 14, 2021Updated 4 years ago
- Multi-dimensional LSTM implementation in TensorFlow☆22Sep 25, 2017Updated 8 years ago
- nicar 17: advanced pdf manipulation☆18Mar 4, 2017Updated 9 years ago
- Text-Induced Corpus Clean-up☆20Jun 20, 2023Updated 2 years ago
- Efficient indexing and retrieval of OCR bounding boxes in Solr☆22Mar 13, 2019Updated 7 years ago
- Rapidly scaffold out visual-vocabulary projects☆11Jan 10, 2019Updated 7 years ago