ocrmypdf / OCRmyPDF
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
☆14,166Updated this week
Related projects ⓘ
Alternatives and complementary repositories for OCRmyPDF
- Project documentation with Markdown.☆19,444Updated 2 weeks ago
- A Python library for reading and writing PDF, powered by QPDF☆2,186Updated this week
- Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and …☆24,553Updated last month
- 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and mor…☆22,370Updated this week
- Tesseract Open Source OCR Engine (main repository)☆62,429Updated last week
- A Python wrapper for Google Tesseract☆5,868Updated 3 weeks ago
- PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.☆5,677Updated this week
- 🦄 A file manager / web client for SFTP, S3, FTP, WebDAV, Git, Minio, LDAP, CalDAV, CardDAV, Mysql, Backblaze, ...☆10,566Updated this week
- A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files☆8,388Updated this week
- A community-supported supercharged version of paperless: scan, index and archive all your physical documents☆22,069Updated this week
- Community maintained fork of pdfminer - we fathom PDF☆5,961Updated 3 months ago
- The Free Software Media System☆35,186Updated this week
- Open Source Continuous File Synchronization☆65,624Updated this week
- Full-featured and highly configurable SFTP, HTTP/S, FTP/S and WebDAV server - S3, Google Cloud Storage, Azure Blob☆9,451Updated this week
- Simple bookmark manager built with Go☆9,506Updated this week
- Python composable command line interface toolkit☆15,789Updated last week
- Cross-platform backup tool for Windows, macOS & Linux with fast, incremental backups, client-side end-to-end encryption, compression and …☆8,110Updated this week
- docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.☆3,883Updated this week
- Open source Python library for converting PDF to DOCX.☆2,610Updated last month
- Glances an Eye on your system. A top/htop alternative for GNU/Linux, BSD, Mac OS and Windows operating systems.☆26,902Updated this week
- "rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, A…☆47,250Updated this week
- A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.☆2,220Updated 2 years ago
- AI-Powered Photos App for the Decentralized Web 🌈 💎✨☆35,454Updated this week
- get things from one computer to another, safely☆20,452Updated this week
- 🎧☁️ Your Personal Streaming Service☆12,114Updated this week
- Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.☆6,749Updated last week
- Easily and securely send things from one computer to another☆28,161Updated last week
- API Support for your favorite torrent trackers☆12,371Updated this week
- Multi functional app to find duplicates, empty folders, similar images etc.☆20,270Updated last month
- Streamlink is a CLI utility which pipes video streams from various services into a video player☆10,093Updated this week