getomni-ai / benchmark
OCR Benchmark
☆470Updated 3 weeks ago
Alternatives and similar repositories for benchmark:
Users that are interested in benchmark are comparing it to the libraries listed below
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has st…☆900Updated last week
- A hub for various industry-specific schemas to be used with VLMs.☆503Updated last week
- ContextGem: Effortless LLM extraction from documents☆115Updated this week
- Structured information extraction from documents☆315Updated 7 months ago
- Fully neural approach for text chunking☆343Updated last week
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆777Updated 3 months ago
- RAG Logger is an open-source logging tool designed specifically for Retrieval-Augmented Generation (RAG) applications. It serves as a lig…☆222Updated 4 months ago
- Lightweight Nearest Neighbors with Flexible Backends☆269Updated 2 months ago
- See Through Your Models☆389Updated 2 months ago
- 🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite☆926Updated this week
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆279Updated 2 weeks ago
- open-source framework for creating and managing simulations populated with AI-powered agents. It provides an intuitive platform for desig…☆915Updated 3 months ago
- Detect and extract tables to markdown and csv☆743Updated 3 months ago
- ☆582Updated last week
- Automate computer tasks in Python☆300Updated this week
- Your toolkit for autonomous, evolving agent ecosystems. Create, execute, govern, and evolve agents that learn from experience, collaborat…☆426Updated 2 weeks ago
- Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)☆621Updated last week
- Integrate LLM in any pipeline - fit/predict pattern, JSON driven flows, and built in concurency support.☆588Updated last month
- Deep Research for your internal data☆313Updated last week
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.☆1,224Updated 2 weeks ago
- Generic rag framework to apply the power of LLMs on any given dataset☆600Updated this week
- Fast Semantic Text Deduplication & Filtering☆659Updated 2 weeks ago
- Fast State-of-the-Art Static Embeddings☆1,589Updated this week
- A list of useful Open Source tools and scrapers to gather data for LLMs☆230Updated 2 months ago
- Extract structured text from pdfs quickly☆471Updated 2 months ago
- A Kubernetes deployable instance of GroundX for document parsing, storage, and search.☆708Updated last week
- Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) int…☆563Updated 2 months ago
- An experiment in meeting transcription and diarization with just an LLM. Maybe I went a little overboard though☆544Updated last month
- An open-source OCR API that leverages OpenAI's powerful language models with optimized performance techniques like parallel processing an…☆852Updated 7 months ago
- Build, Improve Performance, and Productionize your LLM Application with an Integrated Framework☆339Updated 5 months ago