Pankrat / pdf-ocr-overlayLinks
Simple way to make scanned PDFs searchable
☆30Updated 13 years ago
Alternatives and similar repositories for pdf-ocr-overlay
Users that are interested in pdf-ocr-overlay are comparing it to the libraries listed below
Sorting:
- Extract tables from PDF pages.☆298Updated 5 years ago
- A simple viewer and inspection tool for text boxes in PDF documents☆96Updated 3 years ago
- Python library for creating, reading, and modifying Docx files for Microsoft Word☆15Updated 11 years ago
- Command line tool to convert spreadsheets to databases, made for the UK's Office for National Statistics.☆81Updated 2 years ago
- Binary Python bindings for poppler utils for content extraction☆42Updated 4 years ago
- Near-duplicate detection tool☆24Updated 9 years ago
- Create Physical Web Eddystone Beacons using Raspberry Pi, Node.js☆22Updated 10 years ago
- PDF Table Extractor - repository to hold revisable version of code from https://www.cvast.tuwien.ac.at/projects/pdf2table by Burcu Yildiz☆39Updated last year
- Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format☆46Updated 10 months ago
- Convert PDF to HTML without losing text or format.☆21Updated 10 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year
- HTML5 Customizable Reader & Admin Console - Librelio Digital Publishing Suite☆29Updated 10 years ago
- XML files for toptal article☆24Updated 10 years ago
- A library for extracting tables from PDF files☆92Updated 5 years ago
- A cross-platform utility to join, split, stamp, and rotate PDFs written in Python. Yes, Python!☆39Updated 2 years ago
- Crawler that collects and extracts content of daily published news articles☆12Updated 2 years ago
- PDF to XML ALTO file converter☆261Updated this week
- The HTML5 PivotViewer is a fork of a project that was started by LobsterPot Solutions as a cross browser, cross platform version of the S…☆127Updated last year
- Tooling to extract data from scanned paper forms OCR-ed by Tesseract using the HOCR standard.☆84Updated 9 years ago
- Visualization Storytelling Components☆32Updated 11 years ago
- Tribe extracts a network from an email mbox and writes it to a graphml file for visualization and analysis.☆80Updated 2 years ago
- Fast Word Segmentation with Triangular Matrix☆87Updated 4 years ago
- git-xltrail-addin is a BSD-licensed Excel Addin that integrates Git with Excel and vice versa. It works with Microsoft Excel on Windows.☆14Updated 7 years ago
- Javascript library to talk to multiple OLAP backends from multiple frontends☆17Updated 13 years ago
- An expandable and scalable OCR pipeline☆89Updated 8 years ago
- A small Docker built for the OCRopus OCR system.☆19Updated 8 years ago
- Sentiment ananlysis in keras and mxnet☆35Updated 8 years ago
- A general purpose PDF text-layer redaction tool for Python 2/3.☆208Updated last year
- List of SIC codes and descriptions from authoritative sources☆12Updated 8 years ago
- Display data in Excel tables and charts, and programmatically change the formatting of tables and charts.☆26Updated 3 years ago