OCR Benchmark
☆630Oct 21, 2025Updated 5 months ago
Alternatives and similar repositories for benchmark
Users that are interested in benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,649Updated this week
- ☆12Apr 8, 2026Updated last week
- Evaluation framework for document processing models and services.☆67Apr 2, 2026Updated 2 weeks ago
- OCR, layout analysis, reading order, table recognition in 90+ languages☆19,588Updated this week
- Toolkit for linearizing PDFs for LLM datasets/training☆17,120Mar 25, 2026Updated 3 weeks ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- OCR & Document Extraction using vision models☆12,200May 20, 2025Updated 10 months ago
- Knowledge Agents and Management in the Cloud☆4,245Mar 25, 2026Updated 3 weeks ago
- Get your documents ready for gen AI☆57,709Updated this week
- ☆19Aug 19, 2025Updated 7 months ago
- A CLI tool and library written in Go for converting documents to Markdown format.☆25Sep 27, 2025Updated 6 months ago
- SCIPE is a powerful tool for evaluating and diagnosing LLM (Large Language Model) graphs or chains.☆25Nov 5, 2024Updated last year
- Model Context Protocol Server that allows AI models to interact with JigsawStack models!☆23Jul 11, 2025Updated 9 months ago
- Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…☆14,425Updated this week
- SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.☆7,764Nov 7, 2025Updated 5 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.☆2,586Apr 6, 2026Updated last week
- Structured data extraction and instruction calling with ML, LLM and Vision LLM☆5,151Mar 29, 2026Updated 2 weeks ago
- The most accurate document search and store for building AI apps☆3,568Apr 2, 2026Updated 2 weeks ago
- ☆48Dec 16, 2022Updated 3 years ago
- ☆198Apr 6, 2026Updated last week
- "fast" sqlite to parquet and csv converter☆31Nov 5, 2025Updated 5 months ago
- Improved file parsing for LLM’s☆3,155Nov 13, 2024Updated last year
- docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.☆6,006Mar 29, 2026Updated 2 weeks ago
- DocLLM: A layout-aware generative language model for multimodal document understanding☆140Jan 3, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the o…☆2,887Jun 24, 2024Updated last year
- 💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows☆12,395Apr 8, 2026Updated last week
- ☆18Jul 7, 2025Updated 9 months ago
- NeMo Retriever Library is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extracti…☆2,900Apr 9, 2026Updated last week
- The LLM Evaluation Framework☆14,728Apr 9, 2026Updated last week
- Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.5, DeepSeek, gpt-oss locally.☆61,312Updated this week
- A minimal Python framework for building custom AI inference servers with full control over logic, batching, and scaling.☆3,854Apr 9, 2026Updated last week
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆116Aug 26, 2024Updated last year
- Convert PDF to markdown + JSON quickly with high accuracy☆33,701Updated this week
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- DSPy: The framework for programming—not prompting—language models☆33,649Updated this week
- Notebooks to demonstrate TimmWrapper☆16Jan 16, 2025Updated last year
- A curated list of resources on Document Layout Analysis☆12Aug 7, 2025Updated 8 months ago
- InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)☆163May 31, 2024Updated last year
- Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)☆682May 20, 2025Updated 10 months ago
- High-performance retrieval engine for unstructured data☆1,573Nov 10, 2025Updated 5 months ago
- Enhances Tesseract OCR output using LLMs (local or API) for error correction, smart chunking, and markdown formatting of scanned PDFs☆2,905Mar 22, 2026Updated 3 weeks ago