Simple package to extract text with coordinates from programmatic PDFs
☆277May 21, 2026Updated this week
Alternatives and similar repositories for docling-parse
Users that are interested in docling-parse are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆204May 8, 2026Updated 2 weeks ago
- Create fast graph language models from converted PDF documents for knowledge extraction and Q&A.☆59Jan 27, 2025Updated last year
- Docling core data types and transformations☆255Updated this week
- A set of tools to create synthetically-generated data from documents☆48Aug 15, 2025Updated 9 months ago
- Running Docling as an API service☆1,529Updated this week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆22Feb 1, 2025Updated last year
- Making docling agentic through MCP☆619May 15, 2026Updated last week
- Examples using the Deep Search functionalities☆87Jan 29, 2025Updated last year
- Build document-native LLM applications☆58Sep 11, 2024Updated last year
- This repository provides the code for applying Contrastive Learning Penalty Loss (CLPL) and Mixture of Experts (MoE) to the BGE-M3 text e…☆11Dec 27, 2024Updated last year
- Parallel and LAzY Analyzer for PDFs 🏖️☆43Apr 28, 2026Updated 3 weeks ago
- Poor man's simple harvester for arXiv resources☆14Jul 14, 2023Updated 2 years ago
- Evaluation framework for document processing models and services.☆73May 15, 2026Updated last week
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆277Dec 6, 2025Updated 5 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Get your documents ready for gen AI☆59,909May 18, 2026Updated last week
- LoRA supervised fine-tuning, RLHF (PPO) and RAG with llama-3-8B on the TLDR summarization dataset☆14Feb 2, 2025Updated last year
- Repo for "Smart Word Suggestions" (SWS) task and benchmark☆19Dec 4, 2023Updated 2 years ago
- Small python package to measure OCR quality and other related metrics.☆26Feb 19, 2024Updated 2 years ago
- This repository serves as a collection of scrapers procuring and structuring various legal datasets☆19Jun 16, 2023Updated 2 years ago
- Dataset of PNG images from synthetically generated table layouts with annotations in JSONL files☆154Sep 17, 2025Updated 8 months ago
- Repository hosting the common code for the entity-fishing clients☆10May 18, 2026Updated last week
- Extract structured text from pdfs quickly☆686Jun 11, 2025Updated 11 months ago
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆432Feb 1, 2023Updated 3 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Open source project for data preparation for GenAI applications☆932May 15, 2026Updated last week
- ☆20Nov 14, 2023Updated 2 years ago
- 📚 Process PDFs, Word documents and more with spaCy☆903Mar 27, 2026Updated last month
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆320Aug 15, 2025Updated 9 months ago
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆195May 31, 2024Updated last year
- [MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.☆41Apr 7, 2025Updated last year
- Open Access PDF harvester☆42May 3, 2024Updated 2 years ago
- ☆209Apr 29, 2026Updated 3 weeks ago
- Open Access PDF harvester, metadata aggregator and full-text ingester☆62May 3, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆102Dec 23, 2024Updated last year
- library supporting NLP and CV research on scientific papers☆795Nov 8, 2024Updated last year
- python package to parse pdfs with different parsers☆268Sep 12, 2025Updated 8 months ago
- [CVPR 2025] Docopilot: Improving Multimodal Models for Document-Level Understanding☆37Jul 22, 2025Updated 10 months ago
- ☆39Jul 14, 2024Updated last year
- Lightweight OpenCV-style API with pluggable AI inference backends (TensorFlow Lite, ONNX Runtime, MNN) for edge and mobile vision.☆26Jan 26, 2026Updated 3 months ago
- ☆251Jun 10, 2025Updated 11 months ago