Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)
β682May 20, 2025Updated 9 months ago
Alternatives and similar repositories for Versatile-OCR-Program
Users that are interested in Versatile-OCR-Program are comparing it to the libraries listed below
Sorting:
- π discover story relationshipsβ348Jun 24, 2025Updated 8 months ago
- The most accurate document search and store for building AI appsβ3,509Feb 19, 2026Updated last week
- Enhances Tesseract OCR output using LLMs (local or API) for error correction, smart chunking, and markdown formatting of scanned PDFsβ2,869Updated this week
- Fully neural approach for text chunkingβ406Oct 23, 2025Updated 4 months ago
- pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage. Demo: https://tidβ¦β2,738Jan 9, 2026Updated last month
- Fully open-source command-line AI assistant inspired by OpenAI Codex, supporting local language models.β666Jul 7, 2025Updated 7 months ago
- Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit β¦β362May 21, 2025Updated 9 months ago
- Toolkit for linearizing PDFs for LLM datasets/trainingβ16,947Feb 19, 2026Updated last week
- The SOTA Open-Source Browser Agent for autonomously performing complex tasks on the webβ2,335Jun 9, 2025Updated 8 months ago
- Omni SenseVoice: High-Speed Speech Recognition with words timestamps π£οΈπ―β885Dec 10, 2025Updated 2 months ago
- OCR & Document Extraction using vision modelsβ12,144May 20, 2025Updated 9 months ago
- Animating R1's thoughts.β383Feb 17, 2025Updated last year
- A self-hosted API that takes a URL and returns a file with browser screenshots.β1,147Mar 9, 2025Updated 11 months ago
- A cache for AI agents to learn and replay complex behaviors.β758Jun 15, 2025Updated 8 months ago
- RAG Logger is an open-source logging tool designed specifically for Retrieval-Augmented Generation (RAG) applications. It serves as a ligβ¦β227Dec 24, 2024Updated last year
- PDF to markdown using vision LLMs β tables, layouts, and structure preservedβ886Feb 21, 2026Updated last week
- Have a natural, spoken conversation with AI!β3,542Jul 11, 2025Updated 7 months ago
- Detect and extract tables to markdown and csvβ753Jan 24, 2025Updated last year
- Transcribe PDFs with local LLMsβ818Jan 27, 2026Updated last month
- Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.β1,690Dec 21, 2024Updated last year
- Radient turns many data types (not just text) into vectors for similarity search, RAG, regression analysis, and more.β281Feb 13, 2026Updated 2 weeks ago
- A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documenβ¦β6,310Updated this week
- β891May 13, 2025Updated 9 months ago
- Speech to Text but with all the bells and whistles and most importantly AI! AI will clean up your filler words, edit and will refine whatβ¦β328Feb 9, 2025Updated last year
- AI reads books: Page-by-Page PDF Knowledge Extractor & Summarizer. script performs an intelligent page-by-page analysis of PDF books, metβ¦β1,574Jan 20, 2025Updated last year
- Vision infrastructure to turn complex documents into RAG/LLM-ready dataβ2,940Sep 24, 2025Updated 5 months ago
- Secretary is an AI-powered tool that analyzes social media content from specified accounts and delivers results via WeChat. It supports cβ¦β360Aug 4, 2025Updated 6 months ago
- OCR, layout analysis, reading order, table recognition in 90+ languagesβ19,360Updated this week
- NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraβ¦β2,851Updated this week
- β267Nov 15, 2024Updated last year
- A JPEG Image Compression Service using Part Homomorphic Encryption.β31Mar 7, 2025Updated 11 months ago
- A powerful document AI question-answering tool that connects to your local Ollama models. Create, manage, and interact with RAG systems fβ¦β1,096Aug 9, 2025Updated 6 months ago
- Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"β649Feb 24, 2025Updated last year
- Web scraper made for AI and simplicity in mind. It runs as a CLI that can be parallelized and outputs high-quality markdown content.β541Nov 3, 2025Updated 3 months ago
- β79Apr 15, 2025Updated 10 months ago
- Convert PDF to markdown + JSON quickly with high accuracyβ31,857Feb 9, 2026Updated 2 weeks ago
- Official Repo for "TheoremExplainAgent: Towards Video-based Multimodal Explanations for LLM Theorem Understanding" [ACL 2025 oral]β1,462Jul 27, 2025Updated 7 months ago
- Heirarchical Navigable Small Worldsβ101Aug 8, 2025Updated 6 months ago
- Merliot Device Hubβ166Jun 11, 2025Updated 8 months ago