Convert a corpus of PDF to clean text files on a distributed architecture
☆39Mar 5, 2024Updated 2 years ago
Alternatives and similar repositories for ocr-pipeline
Users that are interested in ocr-pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Named Entity Recognition tool for Europeana Newspapers☆14Apr 5, 2018Updated 8 years ago
- Some bits of javascript to transcribe scanned pages using PageXML☆17Mar 18, 2024Updated 2 years ago
- Part of eMOP: Franken+ tool for creating font training for Tesseract OCR engine from page images.☆24Sep 24, 2015Updated 10 years ago
- ☆25Oct 9, 2022Updated 3 years ago
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆25Sep 14, 2016Updated 9 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Rails application supporting the creation of OCR and the IIIF Content Search API☆34Dec 14, 2022Updated 3 years ago
- Want to learn more about Free Law Project technologies, policies and thinking? Get the literature here.☆25Jul 6, 2021Updated 4 years ago
- Identify objects in an image, additionally assigning each pixel of the image to a particular object☆31Sep 17, 2025Updated 7 months ago
- ☆11Aug 8, 2016Updated 9 years ago
- OCR for DjVu☆47Oct 3, 2022Updated 3 years ago
- Gamera 3 for Python 2 (deprecated)☆39Aug 15, 2022Updated 3 years ago
- Deep learning for named entity recognition on CoNLL-2003☆10Dec 23, 2016Updated 9 years ago
- A repository for documentation and tutorials (recipes) that help us cook up great projects☆12Aug 31, 2023Updated 2 years ago
- A module for Omeka S that provides an API for the Neatline 3 single page application☆18Mar 26, 2023Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Generate topic models from open text extracted from files in disk images☆10Apr 11, 2023Updated 3 years ago
- Dockerfiles for Avalon Media System - http://github.com/avalonmediasystem/avalon☆10Updated this week
- Image thumbnailing middleware for Connect.js/Express.js utilizing Smartcrop.js☆30Apr 3, 2018Updated 8 years ago
- Kitodo.Presentation is a feature-rich framework for building a METS- or IIIF-based digital library. It is part of the Kitodo Digital Libr…☆43Updated this week
- Change screen brightness on Linux systems☆12Jul 6, 2015Updated 10 years ago
- A Python implementation of the Viterbi Algorithm with Bigram Hidden Markov Model(HMM) taggers for predicting Parts of Speech(POS) tags. -…☆12Feb 9, 2016Updated 10 years ago
- Returns true if a windows file path does not contain any invalid characters.☆12Jan 27, 2023Updated 3 years ago
- Web-based page layout editor created for EMOP (Early Modern OCR Project).☆11May 21, 2021Updated 4 years ago
- This repo holds the source code for the web application☆15Jul 6, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Arduino library to generate a PWM signal over a shift register (74HC595)☆12Sep 25, 2020Updated 5 years ago
- Object annotation maker in VOC Pascal format using object images and background images☆10Feb 27, 2021Updated 5 years ago
- An API implementing a grammar for text analysis☆13Nov 10, 2015Updated 10 years ago
- FFI-based byte buffers for Idris☆10Jun 21, 2019Updated 6 years ago
- Pure python script that takes user query and summarizes news related to it.☆25Jul 6, 2022Updated 3 years ago
- Learning text classification for journalists through DocHate tips☆10May 13, 2020Updated 5 years ago
- Prototype wikidata portal project.☆10May 3, 2024Updated last year
- Python for Humanities☆13Updated this week
- Adds the ability to transcribe items using the Scripto library.☆17Sep 9, 2025Updated 7 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- a little nodejs server and script that extracts letters from images via tesseract☆19Mar 4, 2015Updated 11 years ago
- A reliable diacritics database with their associated ASCII characters☆13May 3, 2020Updated 5 years ago
- logic programming in elixir☆10Nov 1, 2018Updated 7 years ago
- An API spec to define how to find text in a Web document, using basic information, and return DOM ranges☆15Mar 5, 2019Updated 7 years ago
- Train small sequence models in your browser with WebGPU.☆34Dec 3, 2025Updated 4 months ago
- A-Frame and AR.js workshop☆14Nov 12, 2019Updated 6 years ago
- Fish shell plugin for fzf git bindings☆10Dec 13, 2021Updated 4 years ago