OCR Benchmark
☆632Oct 21, 2025Updated 7 months ago
Alternatives and similar repositories for benchmark
Users that are interested in benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,759May 6, 2026Updated 3 weeks ago
- ☆12Apr 27, 2026Updated last month
- Evaluation framework for document processing models and services.☆73May 15, 2026Updated last week
- OCR, layout analysis, reading order, table recognition in 90+ languages☆19,756May 6, 2026Updated 3 weeks ago
- Toolkit for linearizing PDFs for LLM datasets/training☆17,336Mar 25, 2026Updated 2 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- OCR & Document Extraction using vision models☆12,233May 20, 2025Updated last year
- Knowledge Agents and Management in the Cloud☆4,254May 18, 2026Updated last week
- Get your documents ready for gen AI☆60,372Updated this week
- World's first Nintendo 3DS emulator for Apple devices based on Citra.☆18Apr 7, 2023Updated 3 years ago
- ☆19Aug 19, 2025Updated 9 months ago
- A CLI tool and library written in Go for converting documents to Markdown format.☆26Sep 27, 2025Updated 7 months ago
- SCIPE is a powerful tool for evaluating and diagnosing LLM (Large Language Model) graphs or chains.☆25Nov 5, 2024Updated last year
- Model Context Protocol Server that allows AI models to interact with JigsawStack models!☆23Jul 11, 2025Updated 10 months ago
- Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…☆14,749May 18, 2026Updated last week
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.☆7,852Nov 7, 2025Updated 6 months ago
- Structured data extraction and instruction calling with ML, LLM and Vision LLM☆5,158Updated this week
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.☆2,637May 19, 2026Updated last week
- The most accurate document search and store for building AI apps☆3,599May 11, 2026Updated 2 weeks ago
- ☆48Dec 16, 2022Updated 3 years ago
- "fast" sqlite to parquet and csv converter☆31Nov 5, 2025Updated 6 months ago
- ☆204May 8, 2026Updated 2 weeks ago
- Improved file parsing for LLM’s☆3,160May 17, 2026Updated last week
- docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.☆6,094May 12, 2026Updated 2 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the o…☆2,905Jun 24, 2024Updated last year
- ☆19Jul 7, 2025Updated 10 months ago
- 💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows☆12,607Updated this week
- NeMo Retriever Library is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extracti…☆2,923May 20, 2026Updated last week
- The LLM Evaluation Framework☆15,681Updated this week
- Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.☆65,100Updated this week
- A minimal Python framework for building custom AI inference servers with full control over logic, batching, and scaling.☆3,883May 4, 2026Updated 3 weeks ago
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆117Aug 26, 2024Updated last year
- Convert PDF to markdown + JSON quickly with high accuracy☆35,381May 5, 2026Updated 3 weeks ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- DSPy: The framework for programming—not prompting—language models☆34,631Updated this week
- Notebooks to demonstrate TimmWrapper☆16Jan 16, 2025Updated last year
- A curated list of resources on Document Layout Analysis☆12Aug 7, 2025Updated 9 months ago
- InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)☆162May 31, 2024Updated last year
- Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)☆683May 13, 2026Updated last week
- High-performance retrieval engine for unstructured data☆1,583Nov 10, 2025Updated 6 months ago
- Enhances Tesseract OCR output using LLMs (local or API) for error correction, smart chunking, and markdown formatting of scanned PDFs☆2,925Mar 22, 2026Updated 2 months ago