getomni-ai / benchmarkLinks
OCR Benchmark
☆603Updated 2 months ago
Alternatives and similar repositories for benchmark
Users that are interested in benchmark are comparing it to the libraries listed below
Sorting:
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆835Updated 11 months ago
- Extract structured text from pdfs quickly☆641Updated 6 months ago
- Lightweight Nearest Neighbors with Flexible Backends☆324Updated 2 months ago
- Structured information extraction from documents☆319Updated last year
- OpenAI's Structured Outputs with Logprobs☆200Updated 7 months ago
- See Through Your Models☆401Updated 5 months ago
- 📝 Automatically annotate papers using LLMs☆392Updated 3 weeks ago
- Things you can do with the token embeddings of an LLM☆1,449Updated 3 weeks ago
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆350Updated 6 months ago
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has st…☆1,410Updated 8 months ago
- A flexible, adaptive classification system for dynamic text classification☆517Updated 2 months ago
- open-source framework for creating and managing simulations populated with AI-powered agents. It provides an intuitive platform for desig…☆933Updated 11 months ago
- 🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.☆584Updated last week
- RAG evaluation without the need for "golden answers"☆330Updated 2 weeks ago
- Fully neural approach for text chunking☆404Updated 2 months ago
- Deep Research for your internal data☆351Updated 6 months ago
- Fast Semantic Text Deduplication & Filtering☆859Updated 2 months ago
- Simple package to extract text with coordinates from programmatic PDFs☆226Updated 3 weeks ago
- TWIX is an open-source data extraction tool that reconstructs structured data from documents at scale, accurately and at low cost, by inf…☆209Updated last month
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆519Updated 2 months ago
- ☆174Updated 3 weeks ago
- High-performance retrieval engine for unstructured data☆1,545Updated last month
- 🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL☆1,128Updated 2 weeks ago
- Curate High Quality Datasets, Train, Evaluate and Ship! 🚀☆753Updated this week
- RAG Logger is an open-source logging tool designed specifically for Retrieval-Augmented Generation (RAG) applications. It serves as a lig…☆225Updated last year
- Kura is a simple reproduction of the CLIO paper which uses language models to label user behaviour before clustering them based on embedd…☆379Updated 3 months ago
- Build, Improve Performance, and Productionize your LLM Application with an Integrated Framework☆341Updated last year
- Examples and guides for using the VLM Run API☆302Updated this week
- A hub for various industry-specific schemas to be used with VLMs.☆537Updated 2 weeks ago
- Visualize Different Text Splitting Methods☆312Updated 11 months ago