DS4SD / docling-ibm-models

☆65

Alternatives and similar repositories for docling-ibm-models:

Users that are interested in docling-ibm-models are comparing it to the libraries listed below

DS4SD / docling-parse
Simple package to extract text with coordinates from programmatic PDFs
☆48Updated this week
DS4SD / docling-core
A python library to define and validate data types in Docling.
☆56Updated this week
DS4SD / quackling
Build document-native LLM applications
☆52Updated 4 months ago
DS4SD / deepsearch-glm
Create fast graph language models from converted PDF documents for knowledge extraction and Q&A.
☆33Updated last month
DS4SD / deepsearch-examples
Examples using the Deep Search functionalities
☆56Updated this week
DS4SD / docling-serve
Running Docling as an API service
☆44Updated last month
dswang2011 / DocLLM
DocLLM: A layout-aware generative language model for multimodal document understanding
☆119Updated last year
agamm / semantic-split
A Python library to chunk/group your texts based on semantic similarity.
☆90Updated 6 months ago
DS4SD / DocLayNet
DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis
☆302Updated last year
run-llama / llama_extract
☆109Updated this week
ppaanngggg / yolo-doclaynet
YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis
☆81Updated last week
illuin-tech / vidore-benchmark
Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.
☆164Updated last month
LynnHaDo / Document-Layout-Analysis
Object Detection Model for Scanned Documents
☆86Updated last year
s-emanuilov / litepali
LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.
☆34Updated 3 months ago
Ashufet / Superior-RAG-for-Complex-PDFs-using-LlamaParse
I have explained how to create superior RAG pipeline for complex pdfs using LlamaParse. We can extract text and tables from pdf and QA on…
☆40Updated 10 months ago
huridocs / pdf-document-layout-analysis
A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The servic…
☆232Updated 3 weeks ago
opendatalab / OmniDocBench
A Comprehensive Benchmark for Document Parsing and Evaluation
☆199Updated this week
umarbutler / semchunk
A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.
☆222Updated last week
stephenleo / llm-structured-output-benchmarks
Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…
☆139Updated 3 months ago
SCUT-DLVCLab / Document-AI-Recommendations
Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.
☆174Updated last month
brandonstarxel / chunking_evaluation
This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation.…
☆204Updated 3 months ago
nttmdlab-nlp / InstructDoc
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)
☆155Updated 7 months ago
plaggy / rag-containers
Ready-to-go containerized RAG service. Implemented with text-embedding-inference + Qdrant/LanceDB.
☆55Updated 3 weeks ago
TIGER-AI-Lab / LongRAG
Official repo for "LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs".
☆209Updated 4 months ago
predlico / ARAGOG
ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…
☆101Updated 9 months ago
andreagemelli / doc2graph
Doc2Graph transforms documents into graphs and exploit a GNN to solve several tasks.
☆117Updated last year
cxcscmu / RAGViz
Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]
☆76Updated this week
ai8hyf / TF-ID
TF-ID: Table/Figure IDentifier for academic papers
☆228Updated 6 months ago
moured / YOLOv10-Document-Layout-Analysis
YOLOv10 trained on DocLayNet dataset.
☆69Updated 2 months ago
Unstructured-IO / unstructured-python-client
A Python client for the Unstructured hosted API
☆87Updated this week