dantetemplar / pdf-extraction-agendaLinks
Overview of pipelines related to PDF to Markdown document processing.
☆93Updated 2 months ago
Alternatives and similar repositories for pdf-extraction-agenda
Users that are interested in pdf-extraction-agenda are comparing it to the libraries listed below
Sorting:
- ☆94Updated last year
- Multilingual RAG benchmark.☆11Updated last year
- Enterprise RAG Challenge to test accuracy of different LLM-driven assistants☆255Updated 9 months ago
- Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical …☆636Updated 2 weeks ago
- Code for explaining and evaluating late chunking (chunked pooling)☆477Updated last year
- Effective LLM Alignment Toolkit☆151Updated 6 months ago
- Tools and agents for automated research.☆47Updated 3 weeks ago
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,299Updated last week
- ASR on WS, POST/GET FAST_API Can use many RU asr models.☆18Updated last week
- Hybrid Schema-Guided Reasoning (SGR) has agentic system design created by neuraldeep community☆898Updated this week
- Simple package to extract text with coordinates from programmatic PDFs☆226Updated 3 weeks ago
- 🔥 Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation 🔥. Our toolkit integrates 40 pr…☆528Updated 2 months ago
- Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception☆267Updated 3 months ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,584Updated last week
- HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieval Results in RAG Systems (WWW 2025)☆456Updated 6 months ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆519Updated 2 months ago
- ☆242Updated 6 months ago
- Sentence Transformers API: An OpenAI compatible embedding API server☆69Updated last year
- Framework for enhancing LLMs for RAG tasks using fine-tuning.☆762Updated 2 weeks ago
- [EMNLP 2024: Demo Oral] RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation☆310Updated last year
- Telegram bot for different language models. Supports system prompts and images☆63Updated 6 months ago
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cro…☆906Updated 3 months ago
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆292Updated 4 months ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,438Updated last week
- OpenAPI-like API-server for voice generation (TTS) based on fish-speech-1.5 model.☆28Updated 7 months ago
- ☆33Updated 8 months ago
- The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval☆1,513Updated last year
- MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating fundament…☆62Updated last year
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆272Updated 3 weeks ago
- Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.☆383Updated 8 months ago