dantetemplar / pdf-extraction-agendaLinks
Overview of pipelines related to PDF to Markdown document processing.
☆93Updated last month
Alternatives and similar repositories for pdf-extraction-agenda
Users that are interested in pdf-extraction-agenda are comparing it to the libraries listed below
Sorting:
- ☆94Updated last year
- Code for explaining and evaluating late chunking (chunked pooling)☆470Updated 11 months ago
- Multilingual RAG benchmark.☆11Updated last year
- Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical …☆630Updated this week
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆506Updated last month
- Effective LLM Alignment Toolkit☆150Updated 5 months ago
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,237Updated last week
- Fast Semantic Text Deduplication & Filtering☆852Updated last month
- Enterprise RAG Challenge to test accuracy of different LLM-driven assistants☆246Updated 8 months ago
- ASR on WS, POST/GET FAST_API Can use many RU asr models.☆18Updated this week
- Tools and agents for automated research.☆47Updated this week
- Simple package to extract text with coordinates from programmatic PDFs☆221Updated last week
- PyMuPDF4LLM☆1,160Updated last week
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cro…☆892Updated 2 months ago
- MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating fundament…☆62Updated last year
- ☆33Updated 7 months ago
- Framework for enhancing LLMs for RAG tasks using fine-tuning.☆758Updated 6 months ago
- HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieval Results in RAG Systems (WWW 2025)☆451Updated 6 months ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,581Updated 6 months ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆834Updated 10 months ago
- UniTable: Towards a Unified Table Foundation Model☆514Updated last year
- ☆241Updated 6 months ago
- Telegram bot for different language models. Supports system prompts and images☆63Updated 5 months ago
- Knowledge Graph Generation from Any Text☆790Updated this week
- MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating SOTA mode…☆38Updated 2 months ago
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆272Updated this week
- Lightweight, performant, deep table extraction☆518Updated 4 months ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,420Updated last week
- "Руформеры" - список популярных базовых моделей на основе трансформеров для решения задач по автоматической обработке русского языка☆38Updated 2 years ago
- A python library to define and validate data types in Docling.☆215Updated this week