A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The service allows for the segmentation and classification of different parts of PDF pages, identifying the elements such as texts, titles, pictures, tables and so on.
☆1,142May 6, 2026Updated 2 weeks ago
Alternatives and similar repositories for pdf-document-layout-analysis
Users that are interested in pdf-document-layout-analysis are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This project aims to extract Table of Contents (TOC) information from PDF files using the outputs generated by the pdf-document-layout-an…☆20Feb 3, 2025Updated last year
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆2,166Apr 14, 2025Updated last year
- ☆16Apr 26, 2024Updated 2 years ago
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆432Feb 1, 2023Updated 3 years ago
- A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team…☆1,829Mar 17, 2026Updated 2 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- 阅读顺序、Layoutreader☆18May 8, 2025Updated last year
- A Comprehensive Toolkit for High-Quality PDF Content Extraction☆9,669Jan 3, 2025Updated last year
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆320Aug 15, 2025Updated 9 months ago
- python package to parse pdfs with different parsers☆268Sep 12, 2025Updated 8 months ago
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆8,127Feb 10, 2025Updated last year
- https://no-ocr.com/about☆184Jun 30, 2025Updated 10 months ago
- Toolkit for linearizing PDFs for LLM datasets/training☆17,336Mar 25, 2026Updated 2 months ago
- OCR, layout analysis, reading order, table recognition in 90+ languages☆19,756May 6, 2026Updated 2 weeks ago
- UniTable: Towards a Unified Table Foundation Model☆531Apr 21, 2026Updated last month
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A Unified Toolkit for Deep Learning Based Document Image Analysis☆5,735Aug 15, 2024Updated last year
- Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.☆64,676Updated this week
- OCR & Document Extraction using vision models☆12,233May 20, 2025Updated last year
- Yet Another Document Translator☆8,518May 9, 2026Updated 2 weeks ago
- yet another m3u8 player☆13Jun 8, 2025Updated 11 months ago
- PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.☆5,666Apr 30, 2026Updated 3 weeks ago
- A high-quality PDF to Markdown tool based on large language model visual recognition. 一款基于大模型视觉识别的高质量PDF转Markdown工具☆1,746Jan 25, 2026Updated 4 months ago
- Convert PDF to markdown + JSON quickly with high accuracy☆35,381May 5, 2026Updated 3 weeks ago
- A Repo For Document AI☆3,169May 15, 2026Updated last week
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical …☆701May 4, 2026Updated 3 weeks ago
- Convert Everything to PDF☆228Feb 1, 2026Updated 3 months ago
- An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)☆2,018Mar 17, 2026Updated 2 months ago
- ☆1,372May 13, 2026Updated 2 weeks ago
- Completely free, private, UI based Tech Documentation MCP server. Designed for coders and software developers in mind. Easily integrate i…☆2,076Feb 4, 2026Updated 3 months ago
- YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis☆157Mar 10, 2026Updated 2 months ago
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆277Dec 6, 2025Updated 5 months ago
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,759May 6, 2026Updated 3 weeks ago
- Document Layout Analysis resources repos for development with PdfPig.☆635Oct 1, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- YOLOv11 trained on DocLayNet dataset.☆56Nov 4, 2024Updated last year
- PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.☆9,780Updated this week
- ☆49Jul 4, 2024Updated last year
- 360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute☆306Sep 10, 2024Updated last year
- Object Detection Model for Scanned Documents☆94Mar 6, 2025Updated last year
- Get your documents ready for gen AI☆60,372Updated this week
- Speech to Text but with all the bells and whistles and most importantly AI! AI will clean up your filler words, edit and will refine what…☆332Feb 9, 2025Updated last year