PublicI / pdf-gcv-ocr
Tool to OCR PDFs using Google Cloud Vision
☆41Updated 2 years ago
Alternatives and similar repositories for pdf-gcv-ocr:
Users that are interested in pdf-gcv-ocr are comparing it to the libraries listed below
- gcv2hocr converts from Google Cloud Vision OCR output to hocr to make a searchable pdf.☆106Updated 4 years ago
- Ergonomic line-by-line transcription of scanned text.☆51Updated 4 years ago
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆187Updated last month
- Conversions between various OCR formats☆74Updated last year
- Master repository which includes most other OCR-D repositories as submodules☆72Updated last week
- A database of courts, tests and other experiments☆69Updated last month
- Structured data for classical studies☆19Updated 8 years ago
- Make a searchable pdf via Google Cloud Vision OCR☆14Updated 5 years ago
- guides and test data for OCR4all☆30Updated 2 years ago
- Process, enhance and evaluate multiple OCR output.☆22Updated 5 months ago
- METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)☆53Updated last year
- The Syriac New Testament in Text-Fabric☆12Updated last year
- Automatic alignment of books between HathiTrust, Internet Archive, Google Books, etc.☆35Updated this week
- An expandable and scalable OCR pipeline☆87Updated 7 years ago
- The CIS OCR PostCorrectionTool☆41Updated 2 years ago
- Efficient hOCR tooling☆42Updated last month
- Metadata and per-statute PDFs for the U.S. Statutes at Large through volume 64 (1789-1951).☆15Updated 4 years ago
- A financial disclosure data extraction tool.☆14Updated last year
- A collection of regular expressions for matching citations to state, federal, and even international law☆33Updated 3 years ago
- A Twitter data collection and appraisal application.☆51Updated 2 years ago
- Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format☆46Updated 11 months ago
- Abbreviations for use with the Abbreviation Filter developed for use with Multilingual Zotero.☆17Updated last year
- Python script for converting MBOX files to CSV.☆88Updated 3 years ago
- CollectionBuilder-CSV is a "stand alone" template for creating digital collection and exhibit websites using Jekyll and a metadata CSV.☆26Updated last week
- A community-curated collection of judge profile pics that can be integrated anywhere☆25Updated 3 weeks ago
- Collection of OCR-related python tools and wrappers from @OCR-D☆128Updated this week
- A digital humanities operating system that runs on a USB disk.☆31Updated 7 years ago
- A database of court reporters, tests and other experiments☆102Updated last month
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆389Updated 7 months ago
- OCR-D python tools☆33Updated 7 months ago