dantetemplar / pdf-extraction-agenda
Overview of pipelines related to PDF to Markdown document processing.
☆61Updated last month
Alternatives and similar repositories for pdf-extraction-agenda
Users that are interested in pdf-extraction-agenda are comparing it to the libraries listed below
Sorting:
- Enterprise RAG Challenge to test accuracy of different LLM-driven assistants☆106Updated last month
- Effective LLM Alignment Toolkit☆128Updated last month
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆424Updated last month
- ☆47Updated 2 weeks ago
- ☆88Updated 7 months ago
- Code for explaining and evaluating late chunking (chunked pooling)☆382Updated 4 months ago
- MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating fundament…☆62Updated 7 months ago
- ☆27Updated last month
- Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical …☆236Updated 3 weeks ago
- Implementation of my RAG system that won all categories in Enterprise RAG Challenge 2☆451Updated last month
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆219Updated 11 months ago
- Framework for enhancing LLMs for RAG tasks using fine-tuning.☆736Updated 2 months ago
- ☆26Updated last week
- SAGE: Spelling correction, corruption and evaluation for multiple languages☆151Updated 4 months ago
- Простой нормализатор текстов перед синтезом речи☆32Updated last year
- MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating SOTA mode…☆25Updated last month
- Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception☆156Updated last month
- 🤗 Benchmark Large Language Models Reliably On Your Data☆295Updated this week
- Search-o1: Agentic Search-Enhanced Large Reasoning Models☆851Updated last week
- HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieval Results in RAG Systems (WWW 2025)☆403Updated 3 months ago
- Augmentex — a library for augmenting texts with errors☆63Updated 10 months ago
- Knowledge Graph Generation from Any Text☆451Updated this week
- Репозиторий измеряет качество Yandexgpt, Gigachat, T-Pro, Saiga, Vikhr, Ruadapt на популярных англоязычных бенчмарках: MGSM, MATH, HumanE…☆23Updated 3 weeks ago
- Modified Arena-Hard-Auto LLM evaluation toolkit with an emphasis on Russian language☆42Updated last month
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,408Updated this week
- This repository includes the official implementation of OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs.☆681Updated last month
- GigaChain telegram bot example for technical support☆30Updated 4 months ago
- A python library to define and validate data types in Docling.☆131Updated last week
- The official repository for the paper: Evaluation of Retrieval-Augmented Generation: A Survey.☆153Updated 2 weeks ago
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆237Updated 5 months ago