ocrmypdf / OCRmyPDFLinks
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
☆32,561Updated this week
Alternatives and similar repositories for OCRmyPDF
Users that are interested in OCRmyPDF are comparing it to the libraries listed below
Sorting:
- Python tool for converting files and office documents to Markdown.☆86,605Updated last month
- OCR & Document Extraction using vision models☆12,136Updated 8 months ago
- #1 PDF Application on GitHub that lets you edit PDFs on any device anywhere☆74,057Updated this week
- OCR, layout analysis, reading order, table recognition in 90+ languages☆19,228Updated last week
- Convert PDF to markdown + JSON quickly with high accuracy☆31,582Updated this week
- Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.☆9,699Updated 2 weeks ago
- Open source Python library for converting PDF to DOCX.☆3,290Updated 8 months ago
- The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harne…☆72,187Updated this week
- Convert PDF to HTML without losing text or format.☆5,409Updated 6 months ago
- [EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,…☆31,769Updated 2 months ago
- 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and mor…☆26,795Updated last week
- PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.☆4,816Updated this week
- Robust Speech Recognition via Large-Scale Weak Supervision☆94,315Updated last month
- Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/…☆70,442Updated last week
- An open-source, self-hosted note-taking service. Your thoughts, your data, your control — no tracking, no ads, no subscription fees.☆56,639Updated this week
- Community maintained fork of pdfminer - we fathom PDF☆6,889Updated last week
- Yet Another Document Translator☆7,690Updated 3 weeks ago
- Get your documents ready for gen AI☆52,799Updated this week
- Toolkit for linearizing PDFs for LLM datasets/training☆16,860Updated last week
- Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切…☆15,911Updated 8 months ago
- Financial data platform for analysts, quants and AI agents.☆59,919Updated this week
- A Python library for reading and writing PDF, powered by QPDF☆2,633Updated last week
- PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.☆9,019Updated this week
- A simple screen parsing tool towards pure vision based GUI agent☆24,344Updated 5 months ago
- A Comprehensive Toolkit for High-Quality PDF Content Extraction☆9,203Updated last year
- Virtual whiteboard for sketching hand-drawn like diagrams☆116,235Updated this week
- A browser extension for automating your browser by connecting blocks☆21,007Updated this week
- A modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux, Android, iOS and Web☆26,015Updated this week
- The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.☆102,600Updated last week
- AI productivity studio with smart chat, autonomous agents, and 300+ assistants. Unified access to frontier LLMs☆39,697Updated this week