OCR Benchmark
☆626Oct 21, 2025Updated 5 months ago
Alternatives and similar repositories for benchmark
Users that are interested in benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,589Feb 27, 2026Updated 3 weeks ago
- ☆12Mar 11, 2026Updated 2 weeks ago
- Evaluation framework for document processing models and services.☆66Updated this week
- OCR, layout analysis, reading order, table recognition in 90+ languages☆19,506Mar 1, 2026Updated 3 weeks ago
- Toolkit for linearizing PDFs for LLM datasets/training☆17,043Mar 17, 2026Updated last week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- OCR & Document Extraction using vision models☆12,191May 20, 2025Updated 10 months ago
- Knowledge Agents and Management in the Cloud☆4,248Mar 16, 2026Updated last week
- Get your documents ready for gen AI☆56,339Updated this week
- ☆19Aug 19, 2025Updated 7 months ago
- A CLI tool and library written in Go for converting documents to Markdown format.☆24Sep 27, 2025Updated 5 months ago
- SCIPE is a powerful tool for evaluating and diagnosing LLM (Large Language Model) graphs or chains.☆25Nov 5, 2024Updated last year
- Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…☆14,282Mar 16, 2026Updated last week
- SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.☆7,741Nov 7, 2025Updated 4 months ago
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.☆2,563Mar 17, 2026Updated last week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Structured data extraction and instruction calling with ML, LLM and Vision LLM☆5,142Updated this week
- The most accurate document search and store for building AI apps☆3,541Feb 25, 2026Updated last month
- ☆48Dec 16, 2022Updated 3 years ago
- ☆194Mar 9, 2026Updated 2 weeks ago
- Improved file parsing for LLM’s☆3,153Nov 13, 2024Updated last year
- docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.☆5,967Mar 15, 2026Updated last week
- DocLLM: A layout-aware generative language model for multimodal document understanding☆139Jan 3, 2024Updated 2 years ago
- Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the o…☆2,882Jun 24, 2024Updated last year
- ☆18Jul 7, 2025Updated 8 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- NeMo Retriever Library is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extracti…☆2,885Updated this week
- The LLM Evaluation Framework☆14,227Updated this week
- Unsloth Studio is a web UI for training and running open models like Qwen, DeepSeek, gpt-oss and Gemma locally.☆57,673Updated this week
- 💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows☆12,322Updated this week
- Convert PDF to markdown + JSON quickly with high accuracy☆32,910Mar 10, 2026Updated 2 weeks ago
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆117Aug 26, 2024Updated last year
- DSPy: The framework for programming—not prompting—language models☆33,038Updated this week
- Notebooks to demonstrate TimmWrapper☆16Jan 16, 2025Updated last year
- A minimal Python framework for building custom AI inference servers with full control over logic, batching, and scaling.☆3,823Mar 18, 2026Updated last week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)☆162May 31, 2024Updated last year
- High-performance retrieval engine for unstructured data☆1,567Nov 10, 2025Updated 4 months ago
- Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)☆685May 20, 2025Updated 10 months ago
- Enhances Tesseract OCR output using LLMs (local or API) for error correction, smart chunking, and markdown formatting of scanned PDFs☆2,891Mar 3, 2026Updated 3 weeks ago
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆7,346Feb 21, 2025Updated last year
- Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations,…☆18,383Updated this week
- LLM-Driven Extraction of Unstructured Data — Built for API Deployments & ETL Pipeline Workflows☆6,504Updated this week