yash9439 / Detectron-Layout-ParserLinks
This code performs PDF layout analysis and optical character recognition (OCR) using the layoutparser library and Tesseract OCR Engine. It detects the layout of a PDF document and extracts text from specific regions. The code is divided into several sections, each serving a specific purpose.
☆18Updated 2 years ago
Alternatives and similar repositories for Detectron-Layout-Parser
Users that are interested in Detectron-Layout-Parser are comparing it to the libraries listed below
Sorting:
- ☆392Updated 2 years ago
- Excel spreadsheet crawler and table parser for data extraction and querying☆164Updated 11 months ago
- PyMuPDF4LLM☆1,277Updated last week
- Extract structured text from pdfs quickly☆661Updated 7 months ago
- ☆201Updated last week
- Streamlit PDF viewer☆195Updated last week
- Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget (ACL 2024)☆485Updated 6 months ago
- A Python library to chunk/group your texts based on semantic similarity.☆103Updated last year
- 📚 Process PDFs, Word documents and more with spaCy☆847Updated 11 months ago
- Logical structure analysis for visually structured documents☆93Updated 3 years ago
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cro…☆935Updated last month
- Generalist and Lightweight Model for Relation Extraction (Extract any relationship types from text)☆255Updated 7 months ago
- Lightweight, performant, deep table extraction☆524Updated 3 weeks ago
- SUQL: Conversational Search over Structured and Unstructured Data with LLMs☆297Updated 2 weeks ago
- ☆26Updated 10 months ago
- Automated knowledge graph creation SDK☆124Updated last year
- Developer APIs to Accelerate LLM Projects☆1,742Updated last year
- Docx tracked change redlines for the Python ecosystem.☆103Updated 2 weeks ago
- Simple package to extract text with coordinates from programmatic PDFs☆238Updated this week
- RAG Citation enhances Retrieval-Augmented Generation (RAG) by automatically generating relevant citations for AI-generated content. It en…☆49Updated last year
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.☆1,478Updated 5 months ago
- Docling core data types and transformations☆225Updated last week
- collection of text2cypher datasets, evaluations, and finetuning instructions☆223Updated last year
- ☆142Updated 2 years ago
- `pdfstructure` detects, splits and organizes the documents text content into its natural structure as envisioned by the author.☆105Updated last year
- noslegal taxonomy facets and release notes☆42Updated 5 months ago
- The Rule-based Retrieval package is a Python package that enables you to create and manage Retrieval Augmented Generation (RAG) applicati…☆248Updated last year
- ☆248Updated 7 months ago
- A Structured Output Framework for LLM Outputs☆376Updated 2 months ago
- 🦜💯 Flex those feathers!☆255Updated last year