huridocs / pdf-document-layout-analysis
A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The service allows for the segmentation and classification of different parts of PDF pages, identifying the elements such as texts, titles, pictures, tables and so on.
☆282Updated this week
Alternatives and similar repositories for pdf-document-layout-analysis:
Users that are interested in pdf-document-layout-analysis are comparing it to the libraries listed below
- UniTable: Towards a Unified Table Foundation Model☆445Updated 9 months ago
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆325Updated 2 years ago
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆221Updated 3 months ago
- A Comprehensive Benchmark for Document Parsing and Evaluation☆288Updated 3 weeks ago
- YOLOv10 trained on DocLayNet dataset.☆72Updated 4 months ago
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆954Updated 2 months ago
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆188Updated 10 months ago
- Lightweight, performant, deep table extraction☆435Updated this week
- Object Detection Model for Scanned Documents☆90Updated 2 weeks ago
- Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.☆185Updated 3 weeks ago
- A Curated List of Awesome Table Structure Recognition (TSR) Research. Including models, papers, datasets and codes. Continuously updating…☆167Updated 6 months ago
- YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis☆93Updated last week
- RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF☆830Updated this week
- ☆87Updated last week
- Code for explaining and evaluating late chunking (chunked pooling)☆352Updated 3 months ago
- TF-ID: Table/Figure IDentifier for academic papers☆229Updated 8 months ago
- A python library to define and validate data types in Docling.☆87Updated this week
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆265Updated this week
- Document Layout Analysis resources repos for development with PdfPig.☆605Updated last year
- UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition☆286Updated 2 weeks ago
- Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.☆120Updated last year
- 基于序列表格识别算法推理库,集成PP-Structure和modelscope等表格识别算法。☆240Updated 2 months ago
- Extract structured text from pdfs quickly☆441Updated 3 weeks ago
- Simple package to extract text with coordinates from programmatic PDFs☆81Updated last week
- This project aims to extract text from PDF files using the outputs generated by the pdf-document-layout-analysis service. By leveraging t…☆27Updated last month
- [NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence☆141Updated 6 months ago
- OnnxTR a docTR (Document Text Recognition) library Onnx pipeline wrapper - for seamless, high-performing & accessible OCR☆94Updated this week
- YOLOv11 trained on DocLayNet dataset.☆35Updated 4 months ago
- A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team…☆1,665Updated 2 months ago
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆104Updated 6 months ago