Convert a corpus of PDF to clean text files on a distributed architecture
☆39Mar 5, 2024Updated 2 years ago
Alternatives and similar repositories for ocr-pipeline
Users that are interested in ocr-pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This is a Django project template using uWSGI as application server.☆10May 15, 2019Updated 6 years ago
- Named Entity Recognition tool for Europeana Newspapers☆14Apr 5, 2018Updated 7 years ago
- Trading Consequences data and code☆15Mar 5, 2015Updated 11 years ago
- Some bits of javascript to transcribe scanned pages using PageXML☆17Mar 18, 2024Updated 2 years ago
- Part of eMOP: Franken+ tool for creating font training for Tesseract OCR engine from page images.☆24Sep 24, 2015Updated 10 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Compress data over a Stream using the snappy framing format☆54May 4, 2022Updated 3 years ago
- ☆25Oct 9, 2022Updated 3 years ago
- Watching the SCOTUS☆178Oct 7, 2015Updated 10 years ago
- The iOS SDK for ChatSecure-Push-Server☆16Oct 27, 2019Updated 6 years ago
- Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs☆16Oct 7, 2019Updated 6 years ago
- 🐙 JSON diff diver — the time machine for your JSON objects☆16Dec 22, 2025Updated 3 months ago
- Locate and extract tables and figures in PDFs☆43Mar 19, 2021Updated 5 years ago
- Hyperloop module for https://github.com/WenchaoD/FSCalendar☆12Jan 24, 2019Updated 7 years ago
- Example robot for automating GnuCash with image templates and OCR☆12Jan 23, 2024Updated 2 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Toolbox for OCR post-correction☆122Sep 19, 2019Updated 6 years ago
- Notebooks to accompany the blog posts about the 2nd place Kaggle RSNA winners: https://github.com/darraghdog/rsna☆30Jan 29, 2020Updated 6 years ago
- Deep learning code☆10Jun 9, 2023Updated 2 years ago
- Rails application supporting the creation of OCR and the IIIF Content Search API☆34Dec 14, 2022Updated 3 years ago
- Want to learn more about Free Law Project technologies, policies and thinking? Get the literature here.☆25Jul 6, 2021Updated 4 years ago
- Identify objects in an image, additionally assigning each pixel of the image to a particular object☆31Sep 17, 2025Updated 6 months ago
- Glyph Miner, a system for extracting glyphs from early typeset prints☆34Sep 29, 2016Updated 9 years ago
- Golang Packet Client and Server with Handler Support☆12Feb 6, 2022Updated 4 years ago
- OCR for DjVu☆47Oct 3, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Since I originally wrote this a module called request has come on the scene. You might want to try that before mucking about with extrac…☆26Nov 16, 2015Updated 10 years ago
- An Awesome List for getting started with web archiving☆19Dec 21, 2018Updated 7 years ago
- Gamera 3 for Python 2 (deprecated)☆39Aug 15, 2022Updated 3 years ago
- Website for America's Public Bible☆11Oct 1, 2020Updated 5 years ago
- Speed up your Localization / Internationalization efforts by automating translation with a single script☆27Feb 18, 2017Updated 9 years ago
- Deep learning for named entity recognition on CoNLL-2003☆10Dec 23, 2016Updated 9 years ago
- This is repository is based on Detectron. It can detect quadrilaterals (four sides are not parallel) instead of only bounding boxes. It c…☆11Jun 11, 2022Updated 3 years ago
- ☆13Jun 9, 2021Updated 4 years ago
- A repository for documentation and tutorials (recipes) that help us cook up great projects☆12Aug 31, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A module for Omeka S that provides an API for the Neatline 3 single page application☆18Mar 26, 2023Updated 3 years ago
- ☆11Jan 20, 2021Updated 5 years ago
- Generate topic models from open text extracted from files in disk images☆10Apr 11, 2023Updated 2 years ago
- A Discord bot for Star Wars: Galaxy of Heroes☆12Mar 10, 2019Updated 7 years ago
- Nodejs binding for fasttext representation and classification.☆43Feb 26, 2024Updated 2 years ago
- Dockerfiles for Avalon Media System - http://github.com/avalonmediasystem/avalon☆10Feb 18, 2026Updated last month
- LOC Standards, Schemas, Stylesheets, etc.☆11Sep 30, 2025Updated 5 months ago