Convert a corpus of PDF to clean text files on a distributed architecture
☆39Mar 5, 2024Updated 2 years ago
Alternatives and similar repositories for ocr-pipeline
Users that are interested in ocr-pipeline are comparing it to the libraries listed below
Sorting:
- Trading Consequences data and code☆15Mar 5, 2015Updated 11 years ago
- This is a Django project template using uWSGI as application server.☆10May 15, 2019Updated 6 years ago
- Named Entity Recognition tool for Europeana Newspapers☆14Apr 5, 2018Updated 7 years ago
- Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs☆16Oct 7, 2019Updated 6 years ago
- Some bits of javascript to transcribe scanned pages using PageXML☆17Mar 18, 2024Updated last year
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆25Sep 14, 2016Updated 9 years ago
- Want to learn more about Free Law Project technologies, policies and thinking? Get the literature here.☆25Jul 6, 2021Updated 4 years ago
- Pure python script that takes user query and summarizes news related to it.☆25Jul 6, 2022Updated 3 years ago
- ☆19Dec 8, 2021Updated 4 years ago
- Charter, TSC, and other governance documents.☆15Mar 13, 2024Updated last year
- CLP Principles by the Liquid Legal Institute☆11Apr 12, 2021Updated 4 years ago
- AI and IoT based Smart Parking☆10Apr 15, 2022Updated 3 years ago
- Rails application supporting the creation of OCR and the IIIF Content Search API☆34Dec 14, 2022Updated 3 years ago
- Simple app for visual editing of Page XML files☆31Sep 25, 2025Updated 5 months ago
- Identify objects in an image, additionally assigning each pixel of the image to a particular object☆31Sep 17, 2025Updated 5 months ago
- Kitodo.Presentation is a feature-rich framework for building a METS- or IIIF-based digital library. It is part of the Kitodo Digital Libr…☆43Updated this week
- The second generation of the Triangle Regional Model☆13Feb 25, 2026Updated last week
- Wordpress plugin for Magic the Gathering that enables card tooltips and formatted deck listings.☆13Dec 24, 2025Updated 2 months ago
- A framework, data and configs for generating and building Tesseract OCR lang.traineddata model files, specifically for Japanese☆10Dec 9, 2013Updated 12 years ago
- A repository for documentation and tutorials (recipes) that help us cook up great projects☆12Aug 31, 2023Updated 2 years ago
- How to add formulas to Google Spreadsheet using Google Apps Script - Sarmad Gardezi☆17Apr 24, 2025Updated 10 months ago
- A collection of introductions to various datasets, giving journalists some friendly background before they start doing analysis. Like "Hi…☆71Sep 2, 2014Updated 11 years ago
- Ontology, processing practices and supporting code for change tracking of SKOS vocabularies☆40Jan 26, 2024Updated 2 years ago
- Glyph Miner, a system for extracting glyphs from early typeset prints☆34Sep 29, 2016Updated 9 years ago
- See https://github.com/Dallinger/Dallinger/ for the latest.☆37Apr 15, 2023Updated 2 years ago
- The Fallacy of Placing Confidence in Confidence Intervals☆37Oct 11, 2015Updated 10 years ago
- Library Linked Data for building knowledge cards.☆33Feb 26, 2025Updated last year
- Mailjet API transactional templating samples☆35Feb 28, 2022Updated 4 years ago
- Tamil Language words list☆12Jul 2, 2016Updated 9 years ago
- Grecka is a python script to convert Greek to Greeklish based on ELOT 743☆12Aug 4, 2018Updated 7 years ago
- A collection of OCR'd and machine-corrected Greek texts. This base repository contains Git submodules for the different works and an inve…☆11Nov 18, 2014Updated 11 years ago
- Static photoessay generator using gulp.js☆10Mar 20, 2019Updated 6 years ago
- Voevodsky's 2006 paper on homotopy lambda calculus☆15Jan 11, 2015Updated 11 years ago
- ☆11Aug 10, 2022Updated 3 years ago
- Elixir shipping library☆14May 22, 2024Updated last year
- Create a Google Sheet from a CSV file preventing auto-formatting of date and number fields☆10Jun 27, 2017Updated 8 years ago
- Arduino library to generate a PWM signal over a shift register (74HC595)☆12Sep 25, 2020Updated 5 years ago
- A set of R scripts to visualize and analyze bias in the polls☆24Sep 21, 2013Updated 12 years ago
- Amazon Marketing Cloud Insights on AWS helps advertisers and agencies running campaigns on Amazon Ads to easily deploy AWS services to st…☆16Nov 3, 2025Updated 4 months ago