dantetemplar / pdf-extraction-agendaLinks
Overview of pipelines related to PDF to Markdown document processing.
☆70Updated 3 weeks ago
Alternatives and similar repositories for pdf-extraction-agenda
Users that are interested in pdf-extraction-agenda are comparing it to the libraries listed below
Sorting:
- Effective LLM Alignment Toolkit☆132Updated last month
- ☆32Updated 2 months ago
- Tools and agents for automated research.☆30Updated 2 weeks ago
- MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating fundament…☆61Updated 8 months ago
- ☆88Updated 8 months ago
- Code for explaining and evaluating late chunking (chunked pooling)☆403Updated 6 months ago
- Enterprise RAG Challenge to test accuracy of different LLM-driven assistants☆191Updated 2 months ago
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆526Updated last month
- Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical …☆490Updated last month
- RuBLiMP: Russian Benchmark of Linguistic Minimal Pairs☆17Updated 4 months ago
- Instruct LLMs for flat and nested NER. Fine-tuning Llama and Mistral models for instruction named entity recognition. (Instruction NER)☆84Updated last year
- MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating SOTA mode…☆26Updated 2 months ago
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆252Updated 2 weeks ago
- Sentence Transformers API: An OpenAI compatible embedding API server☆60Updated 9 months ago
- Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception☆195Updated 2 weeks ago
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆246Updated 6 months ago
- Official repo for "LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs".☆235Updated 10 months ago
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, imp…☆181Updated 9 months ago
- 🦖 X—LLM: Cutting Edge & Easy LLM Finetuning☆402Updated last year
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.☆213Updated 3 weeks ago
- Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.☆194Updated 3 months ago
- Репозиторий измеряет качество Yandexgpt, Gigachat, T-Pro, Saiga, Vikhr, Ruadapt на популярных англоязычных бенчмарках: MGSM, MATH, HumanE…☆23Updated 2 months ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆328Updated 2 weeks ago
- Modified Arena-Hard-Auto LLM evaluation toolkit with an emphasis on Russian language☆42Updated 3 months ago
- "Руформеры" - список популярных базовых моделей на основе трансформеров для решения задач по автоматической обработке русского языка☆36Updated last year
- Простой нормализатор текстов перед синтезом речи☆33Updated last year
- The official repository for the paper: Evaluation of Retrieval-Augmented Generation: A Survey.☆160Updated 2 months ago
- UniTable: Towards a Unified Table Foundation Model☆482Updated last year
- Augmentex — a library for augmenting texts with errors☆65Updated 11 months ago
- Bunch of notebooks for pre-training custom Saiga-like LLM☆13Updated last year