yash9439 / Detectron-Layout-Parser
This code performs PDF layout analysis and optical character recognition (OCR) using the layoutparser library and Tesseract OCR Engine. It detects the layout of a PDF document and extracts text from specific regions. The code is divided into several sections, each serving a specific purpose.
☆13Updated last year
Alternatives and similar repositories for Detectron-Layout-Parser:
Users that are interested in Detectron-Layout-Parser are comparing it to the libraries listed below
- A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The servic…☆232Updated 3 weeks ago
- LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.☆35Updated 3 months ago
- Generalist and Lightweight Model for Relation Extraction (Extract any relationship types from text)☆170Updated last week
- DocLLM: A layout-aware generative language model for multimodal document understanding☆119Updated last year
- Examples using the Deep Search functionalities☆56Updated this week
- A python library to define and validate data types in Docling.☆57Updated this week
- ☆167Updated this week
- Simple package to extract text with coordinates from programmatic PDFs☆48Updated this week
- Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget (ACL 2024)☆368Updated 3 months ago
- GraphER: A Structure-aware Text-to-Graph Model for Entity and Relation Extraction☆63Updated 5 months ago
- Python package that adds IntelligentGraph capabilities to RDFLib RDF graph package☆55Updated last year
- collection of text2cypher datasets, evaluations, and finetuning instructions☆153Updated 7 months ago
- Create fast graph language models from converted PDF documents for knowledge extraction and Q&A.☆33Updated last month
- ☆65Updated this week
- ☆11Updated 3 years ago
- A Python library to chunk/group your texts based on semantic similarity.☆90Updated 6 months ago
- TF-ID: Table/Figure IDentifier for academic papers☆228Updated 6 months ago
- Graph based retrieval + GenAI = Better RAG in production☆201Updated 6 months ago
- Build document-native LLM applications☆52Updated 4 months ago
- ☆109Updated this week
- LLM-driven automated knowledge graph construction from text using DSPy and Neo4j.☆161Updated 9 months ago
- ☆341Updated last year
- Lightweight, performant, deep table extraction☆387Updated last month
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆140Updated 3 months ago
- A spaCy wrapper for GliNER☆101Updated 6 months ago
- RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF☆719Updated 2 months ago
- Data for the Chat With Your Data benchmark.☆128Updated last year
- Create a knowledge graph out of unstructed legal text - use said knowledge graph in a graph augmented retrieval augmented generation pipe…☆29Updated 3 months ago
- ☆18Updated 9 months ago
- Running Docling as an API service☆44Updated last month