dantetemplar / pdf-extraction-agendaLinks
Overview of pipelines related to PDF to Markdown document processing.
☆90Updated last week
Alternatives and similar repositories for pdf-extraction-agenda
Users that are interested in pdf-extraction-agenda are comparing it to the libraries listed below
Sorting:
- Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical …☆617Updated last month
- ☆94Updated last year
- Effective LLM Alignment Toolkit☆145Updated 4 months ago
- Tools and agents for automated research.☆41Updated 2 weeks ago
- Enterprise RAG Challenge to test accuracy of different LLM-driven assistants☆235Updated 7 months ago
- Multilingual RAG benchmark.☆11Updated 11 months ago
- Code for explaining and evaluating late chunking (chunked pooling)☆453Updated 10 months ago
- Library for industrial alignment.☆401Updated last month
- Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.☆381Updated 6 months ago
- ASR on WS FAST_API and Sherpa-onnx. Can use Vosk5 and GigaAM☆13Updated 3 weeks ago
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,111Updated this week
- MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating fundament…☆62Updated last year
- UniTable: Towards a Unified Table Foundation Model☆510Updated last year
- MERA (Multimodal Evaluation for Russian-language Architectures) is a new open benchmark for the Russian language for evaluating SOTA mode…☆33Updated 3 weeks ago
- Simple package to extract text with coordinates from programmatic PDFs☆209Updated last week
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆283Updated 2 months ago
- "Руформеры" - список популярных базовых моделей на основе трансформеров для решения задач по автоматической обработке русского языка☆39Updated last year
- Training and data processing code for Saiga☆51Updated 3 months ago
- GigaChain telegram bot example for technical support☆36Updated 10 months ago
- SAGE: Spelling correction, corruption and evaluation for multiple languages☆160Updated 10 months ago
- Modified Arena-Hard-Auto LLM evaluation toolkit with an emphasis on Russian language☆45Updated 7 months ago
- Репозиторий измеряет качество Yandexgpt, Gigachat, T-Pro, Saiga, Vikhr, Ruadapt на популярных англоязычных бенчмарках: MGSM, MATH, HumanE…☆24Updated 6 months ago
- Fast Semantic Text Deduplication & Filtering☆823Updated 3 weeks ago
- The tiniest sentence encoder for Russian language☆243Updated last year
- ☆33Updated 6 months ago
- Telegram bot for different language models. Supports system prompts and images☆62Updated 4 months ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,368Updated last month
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆400Updated this week
- LangChain-compatible integrations with YandexGPT and YandexGPT Embeddings☆44Updated 6 months ago
- Augmentex — a library for augmenting texts with errors☆67Updated last year