PublicI / pdf-gcv-ocrLinks
Tool to OCR PDFs using Google Cloud Vision
☆42Updated 3 years ago
Alternatives and similar repositories for pdf-gcv-ocr
Users that are interested in pdf-gcv-ocr are comparing it to the libraries listed below
Sorting:
- gcv2hocr converts from Google Cloud Vision OCR output to hocr to make a searchable pdf.☆106Updated 5 years ago
- A database of court reporters, tests and other experiments☆117Updated 3 weeks ago
- Jurisdiction ID and abbreviation data files for using with Jurism and other projects.☆38Updated 2 years ago
- an extensible tool to generate hyperlinks from legal citations☆40Updated last year
- A Twitter data collection and appraisal application.☆51Updated 2 years ago
- A commandline tool and Python library for archiving data from Facebook using the Graph API.☆78Updated 7 years ago
- A database of courts, tests and other experiments☆97Updated 2 months ago
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆196Updated 6 months ago
- Ergonomic line-by-line transcription of scanned text.☆54Updated 5 years ago
- Social Feed Manager user interface application.☆156Updated last year
- Find legal citations in any block of text☆192Updated 2 months ago
- Named-Entity Recognition extension for OpenRefine☆29Updated 3 years ago
- Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations a…☆99Updated 3 years ago
- Structured data for classical studies☆19Updated 9 years ago
- A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!☆299Updated 6 months ago
- Examples for getting started using https://case.law☆69Updated 3 years ago
- A simple OpenRefine reconciliation service that runs on top of a CSV file☆125Updated 10 years ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆53Updated 3 weeks ago
- Convert a PDF via OCR to a TXT file in UTF-8 encoding☆154Updated 2 years ago
- A collection of regular expressions for matching citations to state, federal, and even international law☆40Updated 4 years ago
- Please note that the warc-indexer tool & code is now supported by NetArchiveSuite. The 'warc-indexer' directory and code that exists in t…☆131Updated 3 weeks ago
- Python script for converting MBOX files to CSV.☆92Updated 4 years ago
- Make a searchable pdf via Google Cloud Vision OCR☆14Updated 5 years ago
- CollectionBuilder-CSV is a "stand alone" template for creating digital collection and exhibit websites using Jekyll and a metadata CSV.☆35Updated last month
- Abbreviations for use with the Abbreviation Filter developed for use with Multilingual Zotero.☆18Updated 2 years ago
- The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.☆148Updated 2 weeks ago
- Core development repository. gitHub: Vsn 6 (2020 - ), Vsn 5 (2018 - 2020), Vsn 4 (2014-2017). Sourceforge: Vsn 3 (2009-2013), Vsn 1 & 2 (…☆64Updated this week
- 📚 A compilation of research relevant to Data Together's efforts tackling the general problem of data resilience & interactivity☆97Updated 7 years ago
- An online annotation platform for teaching and learning in the humanities.☆108Updated 3 weeks ago
- Process, enhance and evaluate multiple OCR output.☆24Updated 2 weeks ago