ocrmypdf/OCRmyPDF

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ocrmypdf/OCRmyPDF)

ocrmypdf / OCRmyPDF

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

☆34,311

Alternatives and similar repositories for OCRmyPDF

Users that are interested in OCRmyPDF are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Stirling-Tools / Stirling-PDF
View on GitHub
#1 PDF Application on GitHub that lets you edit PDFs on any device anywhere
☆88,183Updated this week
PaddlePaddle / PaddleOCR
View on GitHub
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/…
☆86,350Jul 22, 2026Updated last week
opendatalab / MinerU
View on GitHub
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
☆76,160Updated this week
allenai / olmocr
View on GitHub
Toolkit for linearizing PDFs for LLM datasets/training
☆19,209Mar 25, 2026Updated 4 months ago
tesseract-ocr / tesseract
View on GitHub
Tesseract Open Source OCR Engine (main repository)
☆75,626Updated this week
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
datalab-to / surya
View on GitHub
OCR, layout analysis, reading order, table recognition in 90+ languages
☆21,176Updated this week
datalab-to / marker
View on GitHub
Convert PDF to markdown + JSON quickly with high accuracy
☆37,994Jul 20, 2026Updated last week
JaidedAI / EasyOCR
View on GitHub
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and …
☆29,831Dec 5, 2025Updated 7 months ago
microsoft / markitdown
View on GitHub
Python tool for converting files and office documents to Markdown.
☆169,975Updated this week
paperless-ngx / paperless-ngx
View on GitHub
A community-supported supercharged document management system: scan, index and archive all your documents
☆43,607Updated this week
rustdesk / rustdesk
View on GitHub
An open-source remote desktop application designed for self-hosting, as an alternative to TeamViewer.
☆119,040Updated this week
hiroi-sora / Umi-OCR
View on GitHub
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片，PDF文档识别，排除水印/页眉页脚，扫描/生成二维码。内置多国语言库。
☆46,251Nov 20, 2025Updated 8 months ago
ollama / ollama
View on GitHub
Get up and running with Kimi-K2.6, GLM-5.2, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
☆177,051Updated this week
langgenius / dify
View on GitHub
Build Agentic workflows, RAG pipelines, with rich AI model and tool support on one collaborative workspace. Deploy on cloud, VPC, or self…
☆150,693Updated this week
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
Mintplex-Labs / anything-llm
View on GitHub
Stop renting your intelligence. Own it with AnythingLLM. Everything you need for a powerful local-first agent experience
☆64,058Updated this week
getomni-ai / zerox
View on GitHub
OCR & Document Extraction using vision models
☆12,261May 20, 2025Updated last year
lobehub / lobehub
View on GitHub
🤯 LobeHub is your Chief Agent Operator, organizing your agents into 7×24 operations by hiring, scheduling, and reporting on your entire …
☆80,957Updated this week
infiniflow / ragflow
View on GitHub
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to creat…
☆86,342Updated this week
PDFMathTranslate / PDFMathTranslate
View on GitHub
[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译，支持 Google/DeepL/Ollama/OpenAI 等服务，…
☆35,849May 25, 2026Updated 2 months ago
browser-use / browser-use
View on GitHub
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
☆107,195Updated this week
n8n-io / n8n
View on GitHub
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ in…
☆198,536Updated this week
docling-project / docling
View on GitHub
Get your documents ready for gen AI
☆63,950Updated this week
unclecode / crawl4ai
View on GitHub
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
☆75,515Updated this week
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
open-webui / open-webui
View on GitHub
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
☆147,215Updated this week
khoj-ai / khoj
View on GitHub
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. …
☆36,097Jun 24, 2026Updated last month
yt-dlp / yt-dlp
View on GitHub
A feature-rich command-line audio/video downloader
☆180,951Updated this week
openai / whisper
View on GitHub
Robust Speech Recognition via Large-Scale Weak Supervision
☆106,032Updated this week
localsend / localsend
View on GitHub
An open-source cross-platform alternative to AirDrop
☆86,211Updated this week
syncthing / syncthing
View on GitHub
Open Source Continuous File Synchronization
☆87,051Updated this week
usememos / memos
View on GitHub
Open-source, self-hosted note-taking tool built for quick capture. Markdown-native, lightweight, and fully yours.
☆61,838Updated this week
drawdb-io / drawdb
View on GitHub
Free, simple, and intuitive online database diagram editor and SQL generator.
☆38,212Updated this week
toeverything / AFFiNE
View on GitHub
There can be more than Notion and Miro. AFFiNE(pronounced [ə‘fain]) is a next-gen knowledge base that brings planning, sorting and creati…
☆70,901Updated this week
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
GyulyVGC / sniffnet
View on GitHub
Comfortably monitor your Internet traffic 🕵️‍♂️
☆40,178Updated this week
microsoft / OmniParser
View on GitHub
A simple screen parsing tool towards pure vision based GUI agent
☆25,203Jul 20, 2026Updated last week
stanford-oval / storm
View on GitHub
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
☆30,380Sep 30, 2025Updated 9 months ago
getmaxun / maxun
View on GitHub
🔥 The open-source no-code platform for web scraping, crawling, search and AI data extraction • Turn websites into structured APIs in min…
☆16,912Updated this week
siyuan-note / siyuan
View on GitHub
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
☆45,478Updated this week
pymupdf / PyMuPDF
View on GitHub
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
☆10,343Updated this week
hacksider / Deep-Live-Cam
View on GitHub
real time face swap and one-click video deepfake with only a single image
☆95,359Updated this week