getomni-ai / benchmarkView external linksLinks
OCR Benchmark
☆613Oct 21, 2025Updated 3 months ago
Alternatives and similar repositories for benchmark
Users that are interested in benchmark are comparing it to the libraries listed below
Sorting:
- Evaluation framework for document processing models and services.☆63Updated this week
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,479Dec 19, 2025Updated last month
- ☆47Dec 16, 2022Updated 3 years ago
- OCR & Document Extraction using vision models☆12,136May 20, 2025Updated 8 months ago
- Toolkit for linearizing PDFs for LLM datasets/training☆16,860Feb 5, 2026Updated last week
- OCR, layout analysis, reading order, table recognition in 90+ languages☆19,228Feb 4, 2026Updated last week
- Knowledge Agents and Management in the Cloud☆4,231Feb 2, 2026Updated last week
- Get your documents ready for gen AI☆52,799Updated this week
- SCIPE is a powerful tool for evaluating and diagnosing LLM (Large Language Model) graphs or chains.☆25Nov 5, 2024Updated last year
- SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.☆7,673Nov 7, 2025Updated 3 months ago
- The most accurate document search and store for building AI apps☆3,471Updated this week
- DocLLM: A layout-aware generative language model for multimodal document understanding☆137Jan 3, 2024Updated 2 years ago
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.☆2,503Feb 3, 2026Updated last week
- Structured data extraction and instruction calling with ML, LLM and Vision LLM☆5,118Updated this week
- ☆18Aug 19, 2025Updated 5 months ago
- Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the o…☆2,853Jun 24, 2024Updated last year
- Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…☆13,915Updated this week
- ☆186Jan 23, 2026Updated 3 weeks ago
- Improved file parsing for LLM’s☆3,151Nov 13, 2024Updated last year
- The LLM Evaluation Framework☆13,613Updated this week
- ☆36Oct 7, 2023Updated 2 years ago
- docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.☆5,850Updated this week
- Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.☆51,922Updated this week
- Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations,…☆17,681Updated this week
- Convert PDF to markdown + JSON quickly with high accuracy☆31,582Updated this week
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆76Dec 6, 2024Updated last year
- 💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows☆12,130Updated this week
- Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)☆681May 20, 2025Updated 8 months ago
- Simple to install, powerful command-line based AI agent system for coding.☆561Jan 7, 2026Updated last month
- InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)☆162May 31, 2024Updated last year
- High-performance retrieval engine for unstructured data☆1,559Nov 10, 2025Updated 3 months ago
- DSPy: The framework for programming—not prompting—language models☆32,156Updated this week
- streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL☆2,660Updated this week
- Open Source AI Platform - AI Chat with advanced features that works with every LLM☆17,405Updated this week
- Supercharge Your LLM Application Evaluations 🚀☆12,526Jan 31, 2026Updated last week
- Basic HTR concepts/modules to boost performance☆37Nov 30, 2024Updated last year
- NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extra…☆2,840Feb 6, 2026Updated last week
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆7,275Feb 21, 2025Updated 11 months ago
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆116Aug 26, 2024Updated last year