dantetemplar / pdf-extraction-agendaLinks
Overview of pipelines related to PDF to Markdown document processing.
☆84Updated last month
Alternatives and similar repositories for pdf-extraction-agenda
Users that are interested in pdf-extraction-agenda are comparing it to the libraries listed below
Sorting:
- Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical …☆588Updated last week
- ☆91Updated 10 months ago
- Enterprise RAG Challenge to test accuracy of different LLM-driven assistants☆225Updated 4 months ago
- Code for explaining and evaluating late chunking (chunked pooling)☆443Updated 8 months ago
- Effective LLM Alignment Toolkit☆140Updated 2 months ago
- Multilingual RAG benchmark.☆11Updated 9 months ago
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆752Updated last week
- Tools and agents for automated research.☆35Updated this week
- MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating SOTA mode…☆30Updated this week
- Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception☆244Updated 2 months ago
- HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieval Results in RAG Systems (WWW 2025)☆440Updated 2 months ago
- Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.☆382Updated 4 months ago
- Simple package to extract text with coordinates from programmatic PDFs☆176Updated last week
- Knowledge Graph Generation from Any Text☆569Updated last month
- Lightweight, performant, deep table extraction☆503Updated 3 weeks ago
- Fast Semantic Text Deduplication & Filtering☆795Updated 2 weeks ago
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆258Updated 8 months ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆362Updated 2 weeks ago
- Telegram bot for different language models. Supports system prompts and images☆59Updated 2 months ago
- ☆31Updated 4 months ago
- Sentence Transformers API: An OpenAI compatible embedding API server☆65Updated 11 months ago
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆1,564Updated 4 months ago
- A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The servic…☆670Updated this week
- ☆234Updated 2 months ago
- [EMNLP 2024: Demo Oral] RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation☆305Updated 10 months ago
- Parse PDFs into markdown using Vision LLMs☆417Updated 6 months ago
- ☆137Updated last month
- Framework for enhancing LLMs for RAG tasks using fine-tuning.☆747Updated 3 months ago
- ☆48Updated last month
- GigaChain telegram bot example for technical support☆34Updated 8 months ago