docling-project/docling-parse

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/docling-project/docling-parse)

docling-project / docling-parse

Simple package to extract text with coordinates from programmatic PDFs

☆326

Alternatives and similar repositories for docling-parse

Users that are interested in docling-parse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

docling-project / docling-ibm-models
View on GitHub
☆207Updated this week
docling-project / docling-core
View on GitHub
Docling core data types and transformations
☆271Updated this week
DS4SD / deepsearch-glm
View on GitHub
Create fast graph language models from converted PDF documents for knowledge extraction and Q&A.
☆60Jan 27, 2025Updated last year
docling-project / docling-sdg
View on GitHub
A set of tools to create synthetically-generated data from documents
☆48Aug 15, 2025Updated 11 months ago
docling-project / docling-eval
View on GitHub
Evaluation framework for document processing models and services.
☆77Jul 16, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
docling-project / docling-serve
View on GitHub
Running Docling as an API service
☆1,700Updated this week
docling-project / docling-mcp
View on GitHub
Making docling agentic through MCP
☆695Updated this week
docling-project / docling-jobkit
View on GitHub
☆34Updated this week
docling-project / docling-operator
View on GitHub
☆16Apr 8, 2026Updated 3 months ago
DS4SD / quackling
View on GitHub
Build document-native LLM applications
☆58Sep 11, 2024Updated last year
docling-project / docling-graph
View on GitHub
Transform unstructured documents into validated, rich and queryable knowledge graphs.
☆181Updated this week
DS4SD / deepsearch-toolkit
View on GitHub
Interact with the Deep Search platform for new knowledge explorations and discoveries
☆228Jan 24, 2025Updated last year
docling-project / docling-langchain
View on GitHub
Docling LangChain integration
☆74Nov 17, 2025Updated 8 months ago
docling-project / docling
View on GitHub
Get your documents ready for gen AI
☆63,762Updated this week
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
DS4SD / ragnardoc
View on GitHub
☆22Feb 1, 2025Updated last year
docling-project / docling-agent
View on GitHub
Agent that read, write and edit documents.
☆67Jul 1, 2026Updated 3 weeks ago
DS4SD / deepsearch-examples
View on GitHub
Examples using the Deep Search functionalities
☆90Jan 29, 2025Updated last year
pypdfium2-team / pypdfium2
View on GitHub
Python bindings to PDFium, reasonably cross-platform.
☆801Updated this week
datalab-to / pdftext
View on GitHub
Extract structured text from pdfs quickly
☆708Jul 8, 2026Updated 2 weeks ago
dhdaines / playa
View on GitHub
Parallel and LAzY Analyzer for PDFs 🏖️
☆47Apr 28, 2026Updated 2 months ago
CreaLabs / Enhanced-BGE-M3-with-CLP-and-MoE
View on GitHub
This repository provides the code for applying Contrastive Learning Penalty Loss (CLPL) and Mixture of Experts (MoE) to the BGE-M3 text e…
☆11Dec 27, 2024Updated last year
microsoft / SmartWordSuggestions
View on GitHub
Repo for "Smart Word Suggestions" (SWS) task and benchmark
☆20Dec 4, 2023Updated 2 years ago
explosion / spacy-layout
View on GitHub
📚 Process PDFs, Word documents and more with spaCy
☆909Mar 27, 2026Updated 3 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
InternScience / StructEqTable-Deploy
View on GitHub
A High-efficiency Open-source Toolkit for Table-to-Latex Task
☆276Dec 6, 2025Updated 7 months ago
docling-project / docling-haystack
View on GitHub
Docling Haystack integration
☆29Apr 9, 2026Updated 3 months ago
data-prep-kit / data-prep-kit
View on GitHub
Open source project for data preparation for GenAI applications
☆949Jul 14, 2026Updated last week
davanstrien / ocr-bench
View on GitHub
Per-collection OCR leaderboards using VLM-as-judge
☆68Jul 16, 2026Updated last week
huridocs / pdf-document-layout-analysis
View on GitHub
A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The servic…
☆1,273Jul 13, 2026Updated last week
PhialsBasement / GUI-MCP
View on GitHub
A Blueprint-style visual node editor for creating FastMCP servers. Build MCP tools, resources, and prompts by connecting nodes - no codin…
☆25Dec 8, 2025Updated 7 months ago
dhdaines / paves
View on GitHub
Bajo los adoquines, la PLAYA 🏖️
☆17Jul 3, 2026Updated 3 weeks ago
datalab-to / marker
View on GitHub
Convert PDF to markdown + JSON quickly with high accuracy
☆37,843Updated this week
hirmeos / entity-fishing-client-python
View on GitHub
Repository hosting the common code for the entity-fishing clients
☆10May 18, 2026Updated 2 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
opendatalab / OmniDocBench
View on GitHub
[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation
☆1,914Updated this week
JoelNiklaus / LegalDatasets
View on GitHub
This repository serves as a collection of scrapers procuring and structuring various legal datasets
☆19Jun 16, 2023Updated 3 years ago
unclecode / gitin
View on GitHub
A CLI converts a GitHub repo into one text file for LLMs.
☆15Jan 2, 2025Updated last year
IBM / SynthTabNet
View on GitHub
Dataset of PNG images from synthetically generated table layouts with annotations in JSONL files
☆154Sep 17, 2025Updated 10 months ago
pymupdf / PyMuPDF
View on GitHub
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
☆10,315Updated this week
drmingler / docling-api
View on GitHub
Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) int…
☆879Mar 4, 2025Updated last year
TencentCloudADP / youtu-parsing
View on GitHub
Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding
☆69Jun 15, 2026Updated last month