ocrmypdf / OCRmyPDF
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
☆14,083Updated this week
Related projects ⓘ
Alternatives and complementary repositories for OCRmyPDF
- A Python library for reading and writing PDF, powered by QPDF☆2,177Updated 2 weeks ago
- Community maintained fork of pdfminer - we fathom PDF☆5,948Updated 3 months ago
- Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.☆6,708Updated last month
- Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and …☆24,426Updated last month
- docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.☆3,827Updated this week
- A python module that wraps the pdftoppm utility to convert PDF to PIL Image object☆1,632Updated 3 months ago
- PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.☆5,517Updated this week
- A Python library to extract tabular data from PDFs☆3,010Updated 2 months ago
- A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files☆8,309Updated this week
- 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and mor…☆22,237Updated this week
- Nuitka is a Python compiler written in Python. It's fully compatible with Python 2.6, 2.7, 3.4-3.12. You feed it your Python app, it doe…☆12,014Updated this week
- 🦄 A file manager / web client for SFTP, S3, FTP, WebDAV, Git, Minio, LDAP, CalDAV, CardDAV, Mysql, Backblaze, ...☆10,495Updated this week
- Python PDF Parser (Not actively maintained). Check out pdfminer.six.☆5,254Updated last year
- Tesseract Open Source OCR Engine (main repository)☆62,200Updated this week
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆44,090Updated this week
- A Python wrapper for the tesseract-ocr API☆2,014Updated 2 months ago
- A Gtk/Qt front-end to tesseract-ocr.☆1,634Updated 2 months ago
- Build your personal knowledge base with Trilium Notes☆27,334Updated 3 months ago
- wallabag is a self hostable application for saving web pages: Save and classify articles. Read them later. Freely.☆10,442Updated this week
- Convert PDF to markdown quickly with high accuracy☆17,603Updated this week
- Fast, secure, efficient backup program☆26,489Updated this week
- Python script to do PDF OCR conversion using Tesseract☆373Updated last year
- Web app for browsing, reading and downloading eBooks stored in a Calibre database☆13,023Updated this week
- Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame☆2,186Updated 3 weeks ago
- Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022☆5,818Updated 3 months ago
- A Python wrapper for Google Tesseract☆5,845Updated last week
- OCR, layout analysis, reading order, table recognition in 90+ languages☆13,874Updated this week
- Leptonica is an open source library containing software that is broadly useful for image processing and image analysis applications. The …☆1,797Updated this week
- borb is a library for reading, creating and manipulating PDF files in python.☆3,393Updated last week
- Camelot: PDF Table Extraction for Humans☆3,661Updated last year