andreysaf / PDF-SDK-Evaluation-2021
What to look for in a PDF library in 2021
☆27Updated 4 years ago
Alternatives and similar repositories for PDF-SDK-Evaluation-2021
Users that are interested in PDF-SDK-Evaluation-2021 are comparing it to the libraries listed below
Sorting:
- Fast Neural Machine Translation in C++ - development repository☆19Updated last year
- Tool for visualizing hOCR output from Tesseract (or other OCR engines that support hOCR).☆23Updated 10 years ago
- Indri search implementation on top of Lucene search engine☆34Updated last year
- A selection of test lines of several early printed books as well as the corresponding individual OCRopus models and mixed models.☆10Updated 7 years ago
- Ergonomic line-by-line transcription of scanned text.☆51Updated 4 years ago
- Node starter kit for semantic-search. Uses Mighty Inference Server with Qdrant vector search.☆15Updated 2 years ago
- Tools to evaluate accuracies of various (research papers') metadata extraction libraries☆11Updated 9 years ago
- Logical structure analysis for visually structured documents☆89Updated 2 years ago
- An efficient data structure for fast string similarity searches☆22Updated 4 years ago
- A toolkit for clustering web pages based on various similarity measures.☆33Updated 3 years ago
- Open Access PDF harvester☆40Updated last year
- Tools for evaluating OCR performance relative to ground truth.☆10Updated last year
- An intelligent OCR to detect tables and pure text inside PDFs and obtaing a csv file and a txt from it☆14Updated 6 years ago
- 'ocr-evaluation-tools' from http://ancientgreekocr.org/. Tools to test OCR accuracy.☆22Updated 7 years ago
- Open Collaborative AI Driven Parser builder for Web Scraping, Data Extraction and Crawling,Knowledge Graph☆1Updated 3 months ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆24Updated 7 years ago
- Neural Solr = Solr 9 + Mighty Inference + Node☆17Updated 2 years ago
- Tool that does layout analysis and/or text recognition using tesseract and outputs the result in Page XML format☆46Updated last month
- A set of workflows for corpus building through OCR, post-correction and normalisation☆47Updated 2 years ago
- In-browser OCR of Ancient Greek and Latin☆26Updated 3 weeks ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆69Updated last month
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Updated 2 weeks ago
- Unreliable News Index (for Columbia Journalism Review)☆56Updated 3 years ago
- PDF Extraction Toolkit☆41Updated 4 years ago
- A mirror of the PRONOM file format registry in Linked Open Data format. The Format Registry is a linked (open) data file format repositor…☆10Updated last year
- Implementation of BertGrid : https://arxiv.org/abs/1909.04948☆30Updated last year
- User contributed (non Google) OCR models for Tesseract☆27Updated 3 weeks ago
- Wrapper for pdftohtml that tries to extract paragraph structure☆50Updated 6 years ago
- gcv2hocr converts from Google Cloud Vision OCR output to hocr to make a searchable pdf.☆106Updated 4 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year