huridocs / pdf-document-layout-analysisLinks
A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The service allows for the segmentation and classification of different parts of PDF pages, identifying the elements such as texts, titles, pictures, tables and so on.
☆682Updated 2 weeks ago
Alternatives and similar repositories for pdf-document-layout-analysis
Users that are interested in pdf-document-layout-analysis are comparing it to the libraries listed below
Sorting:
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆1,616Updated 5 months ago
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆824Updated this week
- An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)☆1,729Updated 3 weeks ago
- Lightweight, performant, deep table extraction☆506Updated last month
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆262Updated 9 months ago
- Parse PDFs into markdown using Vision LLMs☆427Updated last week
- OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex lay…☆2,252Updated last month
- Detect and extract tables to markdown and csv☆750Updated 7 months ago
- Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical …☆596Updated last week
- SmolDocling OCR App built using SmolDocling 256M Model and Streamlit.☆160Updated 5 months ago
- ☆541Updated last year
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆278Updated last month
- [NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence☆146Updated last year
- python package to parse pdfs with different parsers☆202Updated last week
- ☆490Updated 6 months ago
- 基于序列表格识别算法推理库,集成PP-Structure和modelscope等表格识别算法。☆372Updated 2 weeks ago
- Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.☆895Updated 11 months ago
- Analysis of Chinese and English layouts 中英文版面分析☆244Updated last month
- UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition☆398Updated 3 months ago
- E2M converts various file types (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, m4a) into Markdown. It’s easy to install, with ded…☆1,228Updated last year
- A python wrapper for the Doc2X API and comes with native texts processing (to improve PDF recall in RAG). | Doc2X API的python封装,同时附带本地的文本处…☆281Updated 3 months ago
- UniTable: Towards a Unified Table Foundation Model☆506Updated last year
- TF-ID: Table/Figure IDentifier for academic papers☆240Updated last year
- A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team…☆1,771Updated 5 months ago
- OpenAI DeepResearch alternative, An AI-driven research system that performs comprehensive, iterative research on any topic using multiple…☆632Updated 3 months ago
- 整理目前开源的最优表格识别模型,完善前后处理,模型转换为ONNX Organize the currently open-source optimal table recognition models, improve pre-processing and post…☆830Updated last month
- ☆2,020Updated 6 months ago
- 如需体验textin文档解析,请点击https://cc.co/16YSIy☆120Updated 2 months ago
- 如需体验TextIn文 档解析,请访问 https://cc.co/16YSIy☆164Updated 3 months ago