dantetemplar / pdf-extraction-agendaLinks
Overview of pipelines related to PDF to Markdown document processing.
☆88Updated 3 months ago
Alternatives and similar repositories for pdf-extraction-agenda
Users that are interested in pdf-extraction-agenda are comparing it to the libraries listed below
Sorting:
- ☆91Updated last year
- Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical …☆602Updated 2 weeks ago
- Effective LLM Alignment Toolkit☆144Updated 3 months ago
- Code for explaining and evaluating late chunking (chunked pooling)☆452Updated 9 months ago
- Enterprise RAG Challenge to test accuracy of different LLM-driven assistants☆232Updated 6 months ago
- Multilingual RAG benchmark.☆11Updated 10 months ago
- Tools and agents for automated research.☆38Updated this week
- Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.☆382Updated 5 months ago
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆890Updated this week
- Simple package to extract text with coordinates from programmatic PDFs☆198Updated 3 weeks ago
- ☆33Updated 5 months ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,346Updated last month
- Framework for enhancing LLMs for RAG tasks using fine-tuning.☆750Updated 4 months ago
- MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating fundament…☆62Updated last year
- HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieval Results in RAG Systems (WWW 2025)☆446Updated 3 months ago
- Modified Arena-Hard-Auto LLM evaluation toolkit with an emphasis on Russian language☆45Updated 6 months ago
- MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating SOTA mode…☆33Updated this week
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cro…☆864Updated 3 weeks ago
- Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception☆244Updated 2 weeks ago
- ☆151Updated last month
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆377Updated last month
- Fast Semantic Text Deduplication & Filtering☆810Updated last month
- Library for industrial alignment.☆401Updated 2 weeks ago
- The tiniest sentence encoder for Russian language☆242Updated last year
- По возможности актуальная информация по ИИ + ресерчи от ChatGPT☆22Updated 3 months ago
- Telegram bot for different language models. Supports system prompts and images☆60Updated 3 months ago
- ASR on WS FAST_API and Sherpa-onnx. Can use Vosk5 and GigaAM☆13Updated last month
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆263Updated 9 months ago
- 🦖 X—LLM: Cutting Edge & Easy LLM Finetuning☆407Updated last year
- PyMuPDF4LLM☆1,067Updated last week