yash9439 / Detectron-Layout-Parser
This code performs PDF layout analysis and optical character recognition (OCR) using the layoutparser library and Tesseract OCR Engine. It detects the layout of a PDF document and extracts text from specific regions. The code is divided into several sections, each serving a specific purpose.
☆14Updated last year
Alternatives and similar repositories for Detectron-Layout-Parser:
Users that are interested in Detectron-Layout-Parser are comparing it to the libraries listed below
- Extracting Semi-Structured Data from PDFs on a large scale☆51Updated 2 years ago
- A Python library to chunk/group your texts based on semantic similarity.☆96Updated 9 months ago
- Simple package to extract text with coordinates from programmatic PDFs☆109Updated 2 weeks ago
- GraphER: A Structure-aware Text-to-Graph Model for Entity and Relation Extraction☆70Updated 8 months ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆293Updated 3 weeks ago
- 📚 Process PDFs, Word documents and more with spaCy☆559Updated last month
- Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget (ACL 2024)☆415Updated 6 months ago
- ☆83Updated last week
- End to end solution for migrating CSV data into a Neo4j graph using an LLM for the data discovery and graph data modeling stages.☆125Updated 4 months ago
- `pdfstructure` detects, splits and organizes the documents text content into its natural structure as envisioned by the author.☆103Updated last year
- NVIDIA AI Blueprint for multimodal PDF data extraction for enterprise RAG☆320Updated last month
- Streamlit PDF viewer☆143Updated this week
- Software that makes labeling PDFs easy.☆413Updated 11 months ago
- ☆105Updated last week
- A spaCy wrapper for GliNER☆112Updated 2 months ago
- A python library to define and validate data types in Docling.☆122Updated this week
- How to construct knowledge graphs from unstructured data sources☆125Updated 6 months ago
- RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF☆877Updated 2 weeks ago
- ☆177Updated last week
- ☆23Updated 3 weeks ago
- Repository to demonstrate Chain of Table reasoning with multiple tables powered by LangGraph☆144Updated last year
- This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation.…☆293Updated last month
- Automated knowledge graph creation SDK☆121Updated 4 months ago
- Making docling agentic through MCP☆22Updated 2 weeks ago
- Excel spreadsheet crawler and table parser for data extraction and querying☆132Updated last month
- ☆82Updated 11 months ago
- ☆121Updated last month
- DocLLM: A layout-aware generative language model for multimodal document understanding☆125Updated last year
- Repository for deepdoctection tutorial notebooks☆44Updated 5 months ago
- 🦜💯 Flex those feathers!☆245Updated 6 months ago