Simple package to extract text with coordinates from programmatic PDFs
☆245Feb 25, 2026Updated this week
Alternatives and similar repositories for docling-parse
Users that are interested in docling-parse are comparing it to the libraries listed below
Sorting:
- ☆187Feb 20, 2026Updated last week
- Docling core data types and transformations☆230Updated this week
- Running Docling as an API service☆1,279Updated this week
- Interact with the Deep Search platform for new knowledge explorations and discoveries☆220Jan 24, 2025Updated last year
- Build document-native LLM applications☆56Sep 11, 2024Updated last year
- AG-UI 4 Java☆22Dec 15, 2025Updated 2 months ago
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆275Dec 6, 2025Updated 2 months ago
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆312Aug 15, 2025Updated 6 months ago
- Using OpenVINO to speed up inference of PaddleOCR-VL model☆25Updated this week
- This repository provides the code for applying Contrastive Learning Penalty Loss (CLPL) and Mixture of Experts (MoE) to the BGE-M3 text e…☆11Dec 27, 2024Updated last year
- Small python package to measure OCR quality and other related metrics.☆27Feb 19, 2024Updated 2 years ago
- EAST-inspired Tensorflow-based Text Detector☆11Feb 18, 2021Updated 5 years ago
- Evaluation framework for document processing models and services.☆64Feb 12, 2026Updated 2 weeks ago
- Poor man's simple harvester for arXiv resources☆13Jul 14, 2023Updated 2 years ago
- ☆17Mar 22, 2024Updated last year
- Extract structured text from pdfs quickly☆667Jun 11, 2025Updated 8 months ago
- Get your documents ready for gen AI☆54,094Feb 24, 2026Updated last week
- Quarkus MockServer Extension☆17Feb 23, 2026Updated last week
- [CVPR 2025] DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding☆27Dec 18, 2025Updated 2 months ago
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆195May 31, 2024Updated last year
- ☆102Dec 23, 2024Updated last year
- ☆31Feb 23, 2026Updated last week
- An open-source project building a customizable ChatGPT-like clone. Built with Django and Next.js, it features chat history, streaming res…☆16Mar 5, 2024Updated last year
- 📚 Process PDFs, Word documents and more with spaCy☆861Mar 8, 2025Updated 11 months ago
- This repository serves as a collection of scrapers procuring and structuring various legal datasets☆18Jun 16, 2023Updated 2 years ago
- ☆22Feb 1, 2025Updated last year
- [MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.☆41Apr 7, 2025Updated 10 months ago
- library supporting NLP and CV research on scientific papers☆789Nov 8, 2024Updated last year
- UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition☆458Sep 28, 2025Updated 5 months ago
- Open source project for data preparation for GenAI applications☆903Feb 16, 2026Updated 2 weeks ago
- [CVPR 2025] Docopilot: Improving Multimodal Models for Document-Level Understanding☆36Jul 22, 2025Updated 7 months ago
- ☆20Nov 4, 2024Updated last year
- Resume Analyzer is a tool for recruiters which can help them to select candidates based on their resume and it also helps by providing a …☆18Aug 25, 2021Updated 4 years ago
- ☆202Feb 22, 2026Updated last week
- Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) int…☆753Mar 4, 2025Updated 11 months ago
- Flow Driven Domain Library, a spring Library that helps you develop DDD process-centric domains☆11Jan 27, 2024Updated 2 years ago
- Shape your React, Next Component Library☆21Jul 15, 2025Updated 7 months ago
- A Unified Toolkit for Deep Learning-Based Table Extraction☆59Nov 21, 2024Updated last year
- ONNX-compatible DocShadow: High-Resolution Document Shadow Removal. Supports TensorRT 🚀☆25Sep 13, 2023Updated 2 years ago