ocrmypdf / OCRmyPDF
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
☆15,867Updated last week
Alternatives and similar repositories for OCRmyPDF:
Users that are interested in OCRmyPDF are comparing it to the libraries listed below
- OCR, layout analysis, reading order, table recognition in 90+ languages☆16,080Updated this week
- Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and …☆25,422Updated 4 months ago
- The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on…☆30,055Updated this week
- Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.☆121,535Updated this week
- Community maintained fork of pdfminer - we fathom PDF☆6,171Updated 6 months ago
- A Python library for reading and writing PDF, powered by QPDF☆2,252Updated this week
- Convert PDF to markdown + JSON quickly with high accuracy☆20,307Updated this week
- Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.☆7,175Updated this week
- SearXNG is a free internet metasearch engine which aggregates results from various search services and databases. Users are neither track…☆15,787Updated this week
- The fastest knowledge base for growing teams. Beautiful, realtime collaborative, feature packed, and markdown compatible.☆29,677Updated this week
- Focalboard is an open source, self-hosted alternative to Trello, Notion, and Asana.☆22,697Updated 4 months ago
- A monitor of resources☆22,801Updated this week
- User-friendly AI Interface (Supports Ollama, OpenAI API, ...)☆68,175Updated this week
- PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.☆6,388Updated this week
- Tesseract Open Source OCR Engine (main repository)☆64,321Updated 3 weeks ago
- A Python wrapper for Google Tesseract☆5,997Updated this week
- rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.☆8,527Updated last month
- The easiest, most secure way to use WireGuard and 2FA.☆20,814Updated this week
- extract text from any document. no muss. no fuss.☆3,965Updated 2 months ago
- Send push notifications to your phone or desktop using PUT/POST☆19,857Updated 4 months ago
- An open source, self-hosted implementation of the Tailscale control server☆25,179Updated this week
- A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.☆2,233Updated 2 years ago
- Open source Python library for converting PDF to DOCX.☆2,735Updated 4 months ago
- A community-supported supercharged version of paperless: scan, index and archive all your physical documents☆24,880Updated this week
- Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provi…☆46,193Updated this week
- A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files☆8,693Updated this week
- The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.☆33,867Updated this week
- A python module that wraps the pdftoppm utility to convert PDF to PIL Image object☆1,699Updated 6 months ago
- Links to awesome OCR projects☆2,888Updated 7 months ago
- Multi functional app to find duplicates, empty folders, similar images etc.☆21,816Updated 2 weeks ago