yash9439 / Detectron-Layout-Parser
This code performs PDF layout analysis and optical character recognition (OCR) using the layoutparser library and Tesseract OCR Engine. It detects the layout of a PDF document and extracts text from specific regions. The code is divided into several sections, each serving a specific purpose.
☆14Updated last year
Alternatives and similar repositories for Detectron-Layout-Parser:
Users that are interested in Detectron-Layout-Parser are comparing it to the libraries listed below
- Logical structure analysis for visually structured documents☆87Updated 2 years ago
- ☆355Updated last year
- Lightweight, performant, deep table extraction☆440Updated this week
- ☆12Updated 4 years ago
- A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The servic…☆283Updated 2 weeks ago
- `pdfstructure` detects, splits and organizes the documents text content into its natural structure as envisioned by the author.☆103Updated last year
- Viewer for the structure extracted by Grobid on PDF documents☆47Updated last month
- ☆131Updated last year
- Python bindings to PDFium☆552Updated 2 weeks ago
- GraphER: A Structure-aware Text-to-Graph Model for Entity and Relation Extraction☆68Updated 8 months ago
- A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.☆440Updated last year
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆329Updated 2 years ago
- A curated list of resources around PDF files☆123Updated 7 months ago
- End to end solution for migrating CSV data into a Neo4j graph using an LLM for the data discovery and graph data modeling stages.☆123Updated 3 months ago
- Parsing pdf tables using YOLOV3☆116Updated 4 years ago
- Python package that adds IntelligentGraph capabilities to RDFLib RDF graph package☆55Updated last year
- ☆120Updated last month
- A python library to define and validate data types in Docling.☆96Updated this week
- Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.☆120Updated last year
- Benchmarking PDF libraries☆268Updated last year
- Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget (ACL 2024)☆404Updated 6 months ago
- Repository for deepdoctection tutorial notebooks☆43Updated 4 months ago
- How to construct knowledge graphs from unstructured data sources☆119Updated 6 months ago
- Extracting Semi-Structured Data from PDFs on a large scale☆51Updated 2 years ago
- TF-ID: Table/Figure IDentifier for academic papers☆230Updated 8 months ago
- LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.☆46Updated 5 months ago
- ☆90Updated 2 weeks ago
- UniTable: Towards a Unified Table Foundation Model☆452Updated 9 months ago
- Examples using the Deep Search functionalities☆69Updated 2 months ago
- Graph based retrieval + GenAI = Better RAG in production☆208Updated 8 months ago