dantetemplar / pdf-extraction-agendaLinks
Overview of pipelines related to PDF to Markdown document processing.
☆94Updated 2 months ago
Alternatives and similar repositories for pdf-extraction-agenda
Users that are interested in pdf-extraction-agenda are comparing it to the libraries listed below
Sorting:
- ☆93Updated last year
- Enterprise RAG Challenge to test accuracy of different LLM-driven assistants☆259Updated 9 months ago
- Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical …☆640Updated last month
- Effective LLM Alignment Toolkit☆152Updated 6 months ago
- Multilingual RAG benchmark.☆11Updated last year
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆539Updated 2 months ago
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,385Updated last month
- Code for explaining and evaluating late chunking (chunked pooling)☆482Updated last year
- Tools and agents for automated research.☆47Updated last month
- Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception☆270Updated 3 months ago
- Simple package to extract text with coordinates from programmatic PDFs☆230Updated last week
- Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.☆383Updated 8 months ago
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆274Updated last month
- Parse PDFs into markdown using Vision LLMs☆457Updated 3 months ago
- Fast Semantic Text Deduplication & Filtering☆866Updated this week
- This repository presents the original implementation of LumberChunker: Long-Form Narrative Document Segmentation by André V. Duarte, João…☆87Updated last year
- ASR on WS, POST/GET FAST_API Can use many RU asr models.☆18Updated 3 weeks ago
- Instruct LLMs for flat and nested NER. Fine-tuning Llama and Mistral models for instruction named entity recognition. (Instruction NER)☆87Updated last year
- MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating fundament…☆62Updated last year
- Репозиторий измеряет качество Yandexgpt, Gigachat, T-Pro, Saiga, Vikhr, Ruadapt на популярных англоязычных бенчмарках: MGSM, MATH, HumanE…☆24Updated 9 months ago
- UniTable: Towards a Unified Table Foundation Model☆521Updated last year
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆297Updated 5 months ago
- MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating SOTA mode…☆38Updated this week
- HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieval Results in RAG Systems (WWW 2025)☆456Updated 7 months ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,456Updated 3 weeks ago
- Open Source Text Embedding Models with OpenAI Compatible API☆165Updated last year
- Training and data processing code for Saiga☆54Updated 2 weeks ago
- ☆33Updated 9 months ago
- Sentence Transformers API: An OpenAI compatible embedding API server☆70Updated last year
- 🦖 X—LLM: Cutting Edge & Easy LLM Finetuning☆407Updated 2 years ago