Scripts and results from our OCR roundup, available on Source
☆150Feb 20, 2019Updated 7 years ago
Alternatives and similar repositories for ocr_testing
Users that are interested in ocr_testing are comparing it to the libraries listed below
Sorting:
- Repository collecting all the submodules for the new PyTorch-based OCR System.☆142Feb 22, 2021Updated 5 years ago
- Core libraries by the PRImA Research Lab☆16Jul 30, 2024Updated last year
- Next generation OCR engine based on LSTMs.☆51Apr 8, 2018Updated 7 years ago
- Whos On First admin data for US, homepage: https://whosonfirst.org☆20Oct 10, 2025Updated 4 months ago
- Line based ATR Engine based on OCRopy☆1,184May 12, 2025Updated 9 months ago
- Yara rules☆22Mar 27, 2023Updated 2 years ago
- Upload SQLite database files to Datasette☆14Nov 10, 2025Updated 3 months ago
- CLI for parsing FEC files, for federal campaign finance pipelines☆19Updated this week
- Small collection of PAGE XML related scripts used at the ZPD Würzburg☆12Aug 2, 2024Updated last year
- convert PubLayNet data into METS/PAGE-XML☆10Mar 17, 2020Updated 5 years ago
- Learning text classification for journalists through DocHate tips☆10May 13, 2020Updated 5 years ago
- transform a datapoint from a website into a CSV time-series dataset using the wayback machine☆12May 24, 2023Updated 2 years ago
- ☆10Mar 10, 2019Updated 6 years ago
- Docker Container for a Make-based, PDF extraction using OCR☆13Jul 31, 2024Updated last year
- Command line tool to convert page layout files to the latest PAGE XML format. It supports all previous versions of the PAGE format as wel…☆24Jan 30, 2021Updated 5 years ago
- Implementation of BertGrid : https://arxiv.org/abs/1909.04948☆30Apr 10, 2024Updated last year
- VGS Collect use cases☆14Dec 10, 2025Updated 2 months ago
- Watching the SCOTUS☆178Oct 7, 2015Updated 10 years ago
- Dense Article Dataset (DAD): A Benchmark Dataset for Document Layout Analysis☆16Jan 13, 2022Updated 4 years ago
- An SQL loader for datasets published via Socrata☆28Dec 8, 2022Updated 3 years ago
- ☆15Jun 22, 2020Updated 5 years ago
- Clojure library exposing newline delimited files as lightning fast databases☆14Feb 27, 2025Updated last year
- ☆126Apr 18, 2020Updated 5 years ago
- Glyph Miner, a system for extracting glyphs from early typeset prints☆34Sep 29, 2016Updated 9 years ago
- Homographs: brutefind homographs within a font☆19Apr 21, 2017Updated 8 years ago
- A POC tricking the end user of a captcha to reveal his browser history☆87Mar 27, 2018Updated 7 years ago
- Render a map for any query with a geometry column☆28Aug 10, 2024Updated last year
- A module for accessing a XLSX spreadsheet as a JavaScript object.☆16Aug 25, 2019Updated 6 years ago
- This repository contains all the tools we are working with related to Chequeabot's ecosystem.☆15May 27, 2025Updated 9 months ago
- Python-based tools for document analysis and OCR☆3,472May 22, 2021Updated 4 years ago
- Repo to host the forms dataset☆17Feb 15, 2021Updated 5 years ago
- ☆72Jun 13, 2018Updated 7 years ago
- carebot-tracker.js — Carebot's tracking component for Google Analytics events☆17Apr 19, 2016Updated 9 years ago
- Simple ranking metrics for PyTorch on CPU or GPU☆15Nov 20, 2020Updated 5 years ago
- Cookiecutter template for a Python package.☆21Dec 15, 2025Updated 2 months ago
- Parses Google Documents formatted for annotated transcripts –– with JavaScript☆18Feb 14, 2022Updated 4 years ago
- Experimental form data extraction for journalism☆78Dec 29, 2020Updated 5 years ago
- A demonstration of how to build and publish pages with the baker build tool☆20Aug 30, 2024Updated last year
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆408Aug 10, 2024Updated last year