dantetemplar / pdf-extraction-agendaLinks
Overview of pipelines related to PDF to Markdown document processing.
☆80Updated last week
Alternatives and similar repositories for pdf-extraction-agenda
Users that are interested in pdf-extraction-agenda are comparing it to the libraries listed below
Sorting:
- ☆88Updated 9 months ago
- Effective LLM Alignment Toolkit☆137Updated 3 weeks ago
- Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical …☆516Updated this week
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆602Updated 2 weeks ago
- Enterprise RAG Challenge to test accuracy of different LLM-driven assistants☆209Updated 3 months ago
- Code for explaining and evaluating late chunking (chunked pooling)☆419Updated 6 months ago
- MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating fundament…☆61Updated 9 months ago
- Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception☆233Updated last month
- ☆32Updated 3 months ago
- RuBLiMP: Russian Benchmark of Linguistic Minimal Pairs☆17Updated 5 months ago
- MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating SOTA mode…☆26Updated 3 months ago
- HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieval Results in RAG Systems (WWW 2025)☆428Updated last month
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆343Updated last month
- Instruct LLMs for flat and nested NER. Fine-tuning Llama and Mistral models for instruction named entity recognition. (Instruction NER)☆84Updated last year
- Framework for enhancing LLMs for RAG tasks using fine-tuning.☆742Updated last month
- RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF☆980Updated 2 weeks ago
- Tools and agents for automated research.☆32Updated 2 weeks ago
- ☆48Updated 2 weeks ago
- Library for industrial alignment.☆396Updated last week
- Top ML papers of the week.☆33Updated this week
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,242Updated last month
- Modified Arena-Hard-Auto LLM evaluation toolkit with an emphasis on Russian language☆43Updated 3 months ago
- A Python library to chunk/group your texts based on semantic similarity.☆97Updated last year
- ☆227Updated last month
- Репозиторий измеряет качество Yandexgpt, Gigachat, T-Pro, Saiga, Vikhr, Ruadapt на популярных англоязычных бенчмарках: MGSM, MATH, HumanE…☆23Updated 3 months ago
- SAGE: Spelling correction, corruption and evaluation for multiple languages☆156Updated 6 months ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,499Updated last month
- RAGChecker: A Fine-grained Framework For Diagnosing RAG☆936Updated 7 months ago
- TF-ID: Table/Figure IDentifier for academic papers☆238Updated last year
- The tiniest sentence encoder for Russian language☆231Updated 11 months ago