huridocs / pdf-document-layout-analysis
A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The service allows for the segmentation and classification of different parts of PDF pages, identifying the elements such as texts, titles, pictures, tables and so on.
☆256Updated 2 weeks ago
Alternatives and similar repositories for pdf-document-layout-analysis:
Users that are interested in pdf-document-layout-analysis are comparing it to the libraries listed below
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆316Updated 2 years ago
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆205Updated 2 months ago
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆167Updated 8 months ago
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆857Updated last month
- A Comprehensive Benchmark for Document Parsing and Evaluation☆238Updated last week
- Object Detection Model for Scanned Documents☆88Updated last year
- UniTable: Towards a Unified Table Foundation Model☆432Updated 8 months ago
- RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF☆763Updated last week
- A Curated List of Awesome Table Structure Recognition (TSR) Research. Including models, papers, datasets and codes. Continuously updating…☆160Updated 5 months ago
- YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis☆82Updated last month
- Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.☆175Updated 2 months ago
- YOLOv10 trained on DocLayNet dataset.☆71Updated 3 months ago
- Lightweight, performant, deep table extraction☆410Updated this week
- Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.☆117Updated last year
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.☆1,492Updated this week
- ☆74Updated last week
- UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition☆268Updated last month
- Code for explaining and evaluating late chunking (chunked pooling)☆324Updated last month
- 基于序列表格识别算法推理库,集成PP-Structure和modelscope等表格识别算法。☆201Updated last month
- Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset☆26Updated last year
- Document Layout Analysis resources repos for development with PdfPig.☆602Updated last year
- Document Layout Analysis☆359Updated last month
- Simple package to extract text with coordinates from programmatic PDFs☆68Updated this week
- ☆53Updated 7 months ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆244Updated this week
- An enterprise-grade AI retriever designed to streamline AI integration into your applications, ensuring cutting-edge accuracy.☆279Updated this week
- TF-ID: Table/Figure IDentifier for academic papers☆228Updated 7 months ago
- Table Structure Recognition☆66Updated last year
- [NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence☆138Updated 5 months ago
- A PyTorch implementation of DTrOCR: Decoder-only Transformer for Optical Character Recognition☆128Updated last week