Convert a corpus of PDF to clean text files on a distributed architecture
☆40Mar 5, 2024Updated 2 years ago
Alternatives and similar repositories for ocr-pipeline
Users that are interested in ocr-pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Named Entity Recognition tool for Europeana Newspapers☆14Apr 5, 2018Updated 8 years ago
- Trading Consequences data and code☆15Mar 5, 2015Updated 11 years ago
- Part of eMOP: Franken+ tool for creating font training for Tesseract OCR engine from page images.☆24Sep 24, 2015Updated 10 years ago
- Compress data over a Stream using the snappy framing format☆54May 4, 2022Updated 4 years ago
- Watching the SCOTUS☆178Oct 7, 2015Updated 10 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- The iOS SDK for ChatSecure-Push-Server☆16Oct 27, 2019Updated 6 years ago
- Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs☆16Oct 7, 2019Updated 6 years ago
- Locate and extract tables and figures in PDFs☆43Mar 19, 2021Updated 5 years ago
- Example robot for automating GnuCash with image templates and OCR☆12Jan 23, 2024Updated 2 years ago
- This app demonstrates WatchSession support in Titanium 5.0☆11Sep 6, 2019Updated 6 years ago
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆25Sep 14, 2016Updated 9 years ago
- Toolbox for OCR post-correction☆120Sep 19, 2019Updated 6 years ago
- PlexusRx demo app for iOS 11☆14Nov 3, 2017Updated 8 years ago
- Rails application supporting the creation of OCR and the IIIF Content Search API☆34Dec 14, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- This sample app demonstrates how to make the activities and content of your app searchable via Spotlight, Safari and Siri by using new AP…☆12Oct 13, 2015Updated 10 years ago
- Want to learn more about Free Law Project technologies, policies and thinking? Get the literature here.☆25Jul 6, 2021Updated 4 years ago
- Glyph Miner, a system for extracting glyphs from early typeset prints☆34Sep 29, 2016Updated 9 years ago
- ☆11Aug 8, 2016Updated 9 years ago
- OCR for DjVu☆47Oct 3, 2022Updated 3 years ago
- Since I originally wrote this a module called request has come on the scene. You might want to try that before mucking about with extrac…☆26Nov 16, 2015Updated 10 years ago
- An Awesome List for getting started with web archiving☆19Dec 21, 2018Updated 7 years ago
- Gamera 3 for Python 2 (deprecated)☆39Aug 15, 2022Updated 3 years ago
- Code for KDD 2014 paper "Mining Topics in Documents: Standing on the Shoulders of Big Data"☆21Oct 6, 2015Updated 10 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Website for America's Public Bible☆11Oct 1, 2020Updated 5 years ago
- Deep learning for named entity recognition on CoNLL-2003☆10Dec 23, 2016Updated 9 years ago
- A repository for documentation and tutorials (recipes) that help us cook up great projects☆12Aug 31, 2023Updated 2 years ago
- ☆11Jan 20, 2021Updated 5 years ago
- A Discord bot for Star Wars: Galaxy of Heroes☆12Mar 10, 2019Updated 7 years ago
- Generate topic models from open text extracted from files in disk images☆10Apr 11, 2023Updated 3 years ago
- Dockerfiles for Avalon Media System - http://github.com/avalonmediasystem/avalon☆10Jun 2, 2026Updated 2 weeks ago
- Docs, notes and resources that don't fit elsewhere.☆13May 23, 2023Updated 3 years ago
- R code to get co-citation networks on social networks in the social sciences vs physics and computer science using Web of Science data.☆22Jan 28, 2015Updated 11 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Image thumbnailing middleware for Connect.js/Express.js utilizing Smartcrop.js☆30Apr 3, 2018Updated 8 years ago
- Kitodo.Presentation is a feature-rich framework for building a METS- or IIIF-based digital library. It is part of the Kitodo Digital Libr…☆44Jun 10, 2026Updated last week
- LOC Standards, Schemas, Stylesheets, etc.☆11Sep 30, 2025Updated 8 months ago
- Change screen brightness on Linux systems☆12Jul 6, 2015Updated 10 years ago
- An expandable and scalable OCR pipeline☆90Nov 14, 2017Updated 8 years ago
- A Python implementation of the Viterbi Algorithm with Bigram Hidden Markov Model(HMM) taggers for predicting Parts of Speech(POS) tags. -…☆12Feb 9, 2016Updated 10 years ago
- Simple app for visual editing of Page XML files☆31Sep 25, 2025Updated 8 months ago