jsfenfen / whatwordwhereView external linksLinks
Tooling to extract data from scanned paper forms OCR-ed by Tesseract using the HOCR standard.
☆84Mar 1, 2016Updated 9 years ago
Alternatives and similar repositories for whatwordwhere
Users that are interested in whatwordwhere are comparing it to the libraries listed below
Sorting:
- NICAR 2016 talk about PDFs!☆63Mar 12, 2016Updated 9 years ago
- ☆25Mar 18, 2013Updated 12 years ago
- Tools for TICCL☆14Dec 12, 2025Updated 2 months ago
- A collection of stemmers in Clojure☆21Jan 17, 2023Updated 3 years ago
- Code & supporting data behind Pioneer Press stories and interactives.☆14Jan 16, 2018Updated 8 years ago
- View HOCR files with Mirador☆29Sep 27, 2017Updated 8 years ago
- Named Entity Recognition tool for Europeana Newspapers☆14Apr 5, 2018Updated 7 years ago
- Crop And Splice Segments (of scanned pages)☆14Mar 11, 2019Updated 6 years ago
- Notes for my talk "Exploring the Radio Spectrum for News"☆13Mar 6, 2020Updated 5 years ago
- ☆13Jul 18, 2018Updated 7 years ago
- An android app that shows the edges and light sources in the live feed from the phone's camera☆11Sep 11, 2017Updated 8 years ago
- A Ruby parser for electronic candidate, PAC and party campaign filings from the Federal Election Commission.☆15Feb 3, 2024Updated 2 years ago
- Manuals, lexica, OCR test data for PoCoTo and the profiler☆15Jul 2, 2021Updated 4 years ago
- Deutsch Language Tool Kit☆12Aug 31, 2015Updated 10 years ago
- Guess a person's gender by their first name. Caveats apply.☆18May 6, 2023Updated 2 years ago
- HOCR manipulation and utility library; provides hocr2pdf binary.☆14Mar 5, 2018Updated 7 years ago
- Fork of dump1090-stream-parser. Takes SBS output from `dump1090` and puts it into a database.☆13Apr 16, 2019Updated 6 years ago
- An Editor for creating simple or complex OCR workflows☆17Jun 13, 2024Updated last year
- An easy-to-use point-and-click geocoder 🌍📍☆15Jan 6, 2023Updated 3 years ago
- Code for extracting data from a large number of PDFs, particularly FCC political ad documents☆15Oct 26, 2017Updated 8 years ago
- GermaNER: Free Open German Named Entity Recognition Tool☆36Dec 16, 2023Updated 2 years ago
- OCR-D wrapper for detectron2 based segmentation models☆17May 1, 2025Updated 9 months ago
- R tools for journalists☆18Mar 9, 2018Updated 7 years ago
- Tools for managing deployment & operations of Common Search.☆12Aug 26, 2016Updated 9 years ago
- Presentations, tutorials and data for the OCR workshop at LMU☆16Jun 2, 2017Updated 8 years ago
- blocks template☆18Mar 28, 2021Updated 4 years ago
- Development version of ndlstm, multidimensional LSTMs for TensorFlow☆19Feb 20, 2018Updated 7 years ago
- An extensible viewer for OCR-D mets.xml files☆22May 30, 2024Updated last year
- Natural language generation with hidden markov models (using hmmlearn)☆24Sep 24, 2016Updated 9 years ago
- Text-Induced Corpus Clean-up☆20Jun 20, 2023Updated 2 years ago
- Multi-dimensional LSTM implementation in TensorFlow☆22Sep 25, 2017Updated 8 years ago
- Ergonomic line-by-line transcription of scanned text.☆54Feb 2, 2026Updated 2 weeks ago
- 'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.☆22Feb 21, 2018Updated 7 years ago
- Part of eMOP: Franken+ tool for creating font training for Tesseract OCR engine from page images.☆24Sep 24, 2015Updated 10 years ago
- Next generation OCR engine based on LSTMs.☆52Apr 8, 2018Updated 7 years ago
- Merge Butane configurations☆21Aug 12, 2025Updated 6 months ago
- Efficient indexing and retrieval of OCR bounding boxes in Solr☆22Mar 13, 2019Updated 6 years ago
- ☆25Apr 22, 2018Updated 7 years ago
- OCR evaluation brought to you by University of Alicante☆67Sep 1, 2022Updated 3 years ago