ocrmypdf / OCRmyPDF
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
☆14,815Updated this week
Alternatives and similar repositories for OCRmyPDF:
Users that are interested in OCRmyPDF are comparing it to the libraries listed below
- Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and …☆25,321Updated 4 months ago
- Tesseract Open Source OCR Engine (main repository)☆64,086Updated last week
- A Python library for reading and writing PDF, powered by QPDF☆2,249Updated this week
- Install and Run Python Applications in Isolated Environments☆11,008Updated this week
- 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and mor…☆23,039Updated last week
- A Python wrapper for Google Tesseract☆5,983Updated last week
- PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.☆6,306Updated this week
- get things from one computer to another, safely☆20,753Updated this week
- "rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, A…☆48,472Updated this week
- Multi functional app to find duplicates, empty folders, similar images etc.☆21,455Updated last week
- OCR, layout analysis, reading order, table recognition in 90+ languages☆15,899Updated this week
- 📚 Collaborative cheatsheets for console commands☆53,259Updated this week
- Pyodide is a Python distribution for the browser and Node.js based on WebAssembly☆12,611Updated this week
- A Python wrapper for the tesseract-ocr API☆2,045Updated 2 months ago
- A lightweight, dependency-free Python library (and command-line utility) for downloading YouTube Videos.☆12,556Updated 5 months ago
- A Python library to extract tabular data from PDFs☆3,126Updated last week
- Convert PDF to markdown + JSON quickly with high accuracy☆19,921Updated this week
- 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production☆37,193Updated 5 months ago
- Fast, secure, efficient backup program☆27,417Updated this week
- Grist is the evolution of spreadsheets.☆7,716Updated this week
- An interactive TLS-capable intercepting HTTP proxy for penetration testers and software developers.☆37,684Updated this week
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆46,004Updated this week
- Tesseract Open Source OCR Engine (main repository)☆3,273Updated 2 months ago
- Port of OpenAI's Whisper model in C/C++☆37,152Updated last week
- 💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows☆10,041Updated this week
- An ebook reader application supporting PDF, DjVu, EPUB, FB2 and many more formats, running on Cervantes, Kindle, Kobo, PocketBook and And…☆18,068Updated this week
- Dolt – Git for Data☆18,218Updated this week
- VS Code in the browser☆69,550Updated last week
- 🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.☆20,380Updated 3 weeks ago
- Glances an Eye on your system. A top/htop alternative for GNU/Linux, BSD, Mac OS and Windows operating systems.☆27,486Updated this week