PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.
☆1,967Mar 13, 2026Updated last week
Alternatives and similar repositories for opendataloader-pdf
Users that are interested in opendataloader-pdf are comparing it to the libraries listed below
Sorting:
- ☆48Jun 20, 2024Updated last year
- A simple JSON parser specifically designed to handle malformed JSON output from Large Language Models (LLMs) like GPT, Claude, and others…☆26Jun 20, 2025Updated 9 months ago
- nanoRLHF: from-scratch journey into how LLMs and RLHF really work.☆170Jan 23, 2026Updated last month
- ☆10Feb 14, 2025Updated last year
- 한국어 벤치마크 평가 코드 통합본(?)☆20Nov 15, 2024Updated last year
- ☆902Mar 11, 2026Updated last week
- Python tool for converting files and office documents to Markdown.☆90,728Mar 10, 2026Updated last week
- Get your documents ready for gen AI☆55,944Updated this week
- A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive vi…☆34,741Feb 25, 2026Updated 3 weeks ago
- Portable Memory Harness for Agents. Grounding the Autonomous Era☆99Mar 12, 2026Updated last week
- Multilingual Document Layout Parsing in a Single Vision-Language Model☆8,069Feb 15, 2026Updated last month
- This repository aims to develop CoT Steering based on CoT without Prompting. It focuses on enhancing the model’s latent reasoning capabil…☆115Jun 25, 2025Updated 8 months ago
- The Python Implementation of CRISP: Clustering Multi-Vector Representations for Denoising and Pruning☆27Jul 27, 2025Updated 7 months ago
- ☆19Oct 24, 2023Updated 2 years ago
- Turn any collection of documents into a knowledge graph. Extract entities and relationships via LLM, deduplicate with your approval, and …☆440Feb 24, 2026Updated 3 weeks ago
- A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The servic…☆1,097Mar 2, 2026Updated 2 weeks ago
- The most accurate document search and store for building AI apps☆3,537Feb 25, 2026Updated 3 weeks ago
- [MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100% private RAG application on …☆10,325Updated this week
- Dataset fingerprinting for AIBOM☆15Feb 18, 2026Updated last month
- Toolkit for linearizing PDFs for LLM datasets/training☆17,008Mar 13, 2026Updated last week
- The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.☆8,862Dec 17, 2025Updated 3 months ago
- Petal is a native macOS menu bar app for fast, local-first audio transcription.☆130Updated this week
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆7,348Feb 21, 2025Updated last year
- mini cli search engine for your docs, knowledge bases, meeting notes, whatever. Tracking current sota approaches while being all local☆15,220Mar 11, 2026Updated last week
- The AI Browser Automation Framework☆21,583Updated this week
- Korean Named Entity Corpus☆25May 12, 2023Updated 2 years ago
- OCR model that handles complex tables, forms, handwriting with full layout.☆4,928Jan 13, 2026Updated 2 months ago
- ☆2,348Nov 29, 2025Updated 3 months ago
- A system for agentic LLM-powered data processing and ETL☆3,690Mar 12, 2026Updated last week
- 🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!☆29,446Mar 11, 2026Updated last week
- Convert PDF to markdown + JSON quickly with high accuracy☆32,617Mar 10, 2026Updated last week
- An open-source RAG-based tool for chatting with your documents.☆25,205Mar 8, 2026Updated last week
- OCR, layout analysis, reading order, table recognition in 90+ languages☆19,477Mar 1, 2026Updated 2 weeks ago
- 🔥 Open Source Browser API for AI Agents & Apps. Steel Browser is a batteries-included browser sandbox that lets you automate the web wit…☆6,669Mar 13, 2026Updated last week
- 🪄 Create rich visualizations with AI☆15,134Updated this week
- FinceptTerminal is a modern finance application offering advanced market analytics, investment research, and economic data tools, designe…☆2,795Updated this week
- Build Real-Time Knowledge Graphs for AI Agents☆23,786Updated this week
- ☆214Mar 12, 2026Updated last week
- OpenSource Production ready Customer service with built in Evals and monitoring☆1,438Jan 12, 2026Updated 2 months ago