Pankrat / pdf-ocr-overlayLinks
Simple way to make scanned PDFs searchable
☆30Updated 13 years ago
Alternatives and similar repositories for pdf-ocr-overlay
Users that are interested in pdf-ocr-overlay are comparing it to the libraries listed below
Sorting:
- Scripts and results from our OCR roundup, available on Source☆150Updated 6 years ago
- Get semantic HTML from PDFs, recover lost text, tables, data... in bulk.☆34Updated 11 months ago
- Working with hOCR in Javascript☆136Updated 2 years ago
- Extract tables from PDF pages.☆298Updated 5 years ago
- A simple viewer and inspection tool for text boxes in PDF documents☆95Updated 3 years ago
- OpenSeadragonizer: zooming browser extension☆18Updated 3 weeks ago
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆402Updated last year
- XML files for toptal article☆24Updated 9 years ago
- Parse MS Outlook PST/OST files from node☆15Updated 13 years ago
- GDG London hackathon. Prototype for Android app to get display public data on your location in an info-graphic style.☆24Updated 12 years ago
- lachesis automates the segmentation of a transcript into closed captions☆35Updated 8 years ago
- gcv2hocr converts from Google Cloud Vision OCR output to hocr to make a searchable pdf.☆107Updated 5 years ago
- Crawler that collects and extracts content of daily published news articles☆12Updated 2 years ago
- Web based JavaScript GUI library for proofreading/editing hOCR☆100Updated 7 years ago
- Tools to convert AIML code into RiveScript code.☆19Updated 8 years ago
- Ergonomic line-by-line transcription of scanned text.☆54Updated 4 years ago
- Convert images to a styled, minimal representation, quickly with NumPy☆27Updated 8 years ago
- Download, aggregate, and filter RSS feeds.☆67Updated 10 years ago
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Updated 11 years ago
- This tool removes the background of an image based on manually added markers (based on OpenCV)☆17Updated 7 years ago
- ScanBooth is a collection of software for running a 3D photo booth. It includes tools for automating 3D scan capture, cleanup, printing a…☆34Updated 13 years ago
- Stitches scanned image segments☆64Updated 12 years ago
- PDF to XML ALTO file converter☆254Updated this week
- git snapshot of camstudio hg repo ( http://sourceforge.net/scm/?type=hg&group_id=131922 )☆16Updated 14 years ago
- Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format☆46Updated 7 months ago
- The hOCR Embedded OCR Workflow and Output Format☆75Updated last year
- Visualization Storytelling Components☆32Updated 11 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year
- Display data in Excel tables and charts, and programmatically change the formatting of tables and charts.☆27Updated 2 years ago
- Azure Search Cognitive Skill to extract technical and business skills from text☆81Updated last year