OCR Benchmark
☆633Oct 21, 2025Updated 7 months ago
Alternatives and similar repositories for benchmark
Users that are interested in benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,806Updated this week
- ☆12Apr 27, 2026Updated last month
- Evaluation framework for document processing models and services.☆75May 28, 2026Updated 2 weeks ago
- OCR, layout analysis, reading order, table recognition in 90+ languages☆20,780Updated this week
- Toolkit for linearizing PDFs for LLM datasets/training☆17,387Mar 25, 2026Updated 2 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- OCR & Document Extraction using vision models☆12,242May 20, 2025Updated last year
- Knowledge Agents and Management in the Cloud☆4,251May 18, 2026Updated 3 weeks ago
- Get your documents ready for gen AI☆61,291Updated this week
- World's first Nintendo 3DS emulator for Apple devices based on Citra.☆18Apr 7, 2023Updated 3 years ago
- ☆19Aug 19, 2025Updated 9 months ago
- A CLI tool and library written in Go for converting documents to Markdown format.☆26Sep 27, 2025Updated 8 months ago
- SCIPE is a powerful tool for evaluating and diagnosing LLM (Large Language Model) graphs or chains.☆25Nov 5, 2024Updated last year
- Model Context Protocol Server that allows AI models to interact with JigsawStack models!☆23Jul 11, 2025Updated 11 months ago
- Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…☆14,897Updated this week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.☆7,881Nov 7, 2025Updated 7 months ago
- Structured data extraction, instruction calling and agentic workflows with ML, LLM and Vision LLM☆5,161Updated this week
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.☆2,668Updated this week
- The most accurate document search and store for building AI apps☆3,610May 11, 2026Updated last month
- ☆48Dec 16, 2022Updated 3 years ago
- ☆206Jun 4, 2026Updated last week
- Improved file parsing for LLM’s☆3,163May 17, 2026Updated 3 weeks ago
- docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.☆6,137Updated this week
- DocLLM: A layout-aware generative language model for multimodal document understanding☆143Jan 3, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the o…☆2,911Jun 24, 2024Updated last year
- ☆19Jul 7, 2025Updated 11 months ago
- 💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows☆12,642Jun 8, 2026Updated last week
- NeMo Retriever Library is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever Library …☆2,936Updated this week
- The LLM Evaluation Framework☆16,037Jun 9, 2026Updated last week
- Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.☆66,153Updated this week
- A minimal Python framework for building custom AI inference servers with full control over logic, batching, and scaling.☆3,891Jun 9, 2026Updated last week
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆117Aug 26, 2024Updated last year
- Convert PDF to markdown + JSON quickly with high accuracy☆36,101Jun 6, 2026Updated last week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- DSPy: The framework for programming—not prompting—language models☆34,958Updated this week
- Notebooks to demonstrate TimmWrapper☆16Jan 16, 2025Updated last year
- A curated list of resources on Document Layout Analysis☆12Aug 7, 2025Updated 10 months ago
- InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)☆162May 31, 2024Updated 2 years ago
- Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)☆682May 13, 2026Updated last month
- High-performance retrieval engine for unstructured data☆1,584Nov 10, 2025Updated 7 months ago
- Enhances Tesseract OCR output using LLMs (local or API) for error correction, smart chunking, and markdown formatting of scanned PDFs☆2,929Mar 22, 2026Updated 2 months ago