PublicI / pdf-gcv-ocr
Tool to OCR PDFs using Google Cloud Vision
☆38Updated last year
Related projects ⓘ
Alternatives and complementary repositories for pdf-gcv-ocr
- A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR …☆65Updated 10 months ago
- Ergonomic line-by-line transcription of scanned text.☆48Updated 3 years ago
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆180Updated last month
- A database of courts, tests and other experiments☆63Updated 3 months ago
- Create local backups of airtable databases☆36Updated last year
- Swift scripts for PDF manipulation, for Shortcuts or Terminal☆12Updated last year
- Find legal citations in any block of text☆123Updated 4 months ago
- Efficient hOCR tooling☆40Updated 2 months ago
- A database of court reporters, tests and other experiments☆95Updated last week
- Named-Entity Recognition extension for OpenRefine☆24Updated last year
- Convert a PDF via OCR to a TXT file in UTF-8 encoding☆141Updated last year
- Conversions between various OCR formats☆71Updated last year
- guides and test data for OCR4all☆30Updated 2 years ago
- Encoding the Bible in TEI, starting with the Gospels☆24Updated last year
- WARC and ARC indexing and discovery tools.☆117Updated 3 months ago
- Process, enhance and evaluate multiple OCR output.☆20Updated 3 weeks ago
- Structured data for classical studies☆18Updated 8 years ago
- OCR-D python tools☆33Updated 3 months ago
- A collection of regular expressions for matching citations to state, federal, and even international law☆33Updated 3 years ago
- an extensible tool to generate hyperlinks from legal citations☆32Updated last month
- METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)☆52Updated last year
- An online annotation platform for teaching and learning in the humanities.☆106Updated 3 weeks ago
- Airtable backup script package☆22Updated 2 years ago
- Master repository which includes most other OCR-D repositories as submodules☆72Updated last month
- CollectionBuilder-CSV is a "stand alone" template for creating digital collection and exhibit websites using Jekyll and a metadata CSV.☆23Updated this week
- Analyze XML extracted from PDFs (e.g. from TET or PDFMiner)☆20Updated 6 years ago
- Reading legal authority for the last time☆34Updated 6 months ago
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 6 years ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆33Updated last week
- List of tools for dealing with the wonderful PDF format.☆45Updated 4 years ago