This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation. It allows users to compare different chunking methods and includes implementations of several novel chunking strategies.
☆486Dec 13, 2025Updated 4 months ago
Alternatives and similar repositories for chunking_evaluation
Users that are interested in chunking_evaluation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An Overview of the Latest Document Chunking Research☆88Nov 25, 2024Updated last year
- ☆34Jun 17, 2024Updated last year
- Fast BM25 search in Python, powered by Numpy and Numba☆1,656Updated this week
- Query Only Linear Adapter Training for Fine Tuned Embedding Model Query Representation☆28Sep 12, 2024Updated last year
- ☆22Oct 14, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Model implementation for the contextual embeddings project☆47Jun 2, 2025Updated 11 months ago
- Code for explaining and evaluating late chunking (chunked pooling)☆510Dec 23, 2024Updated last year
- Fork of OpenAI's Realtime Console, adapted for Vocal RAG☆36Oct 18, 2024Updated last year
- Optimize Document Retrieval with Fine-Tuned KnowledgeBases☆184Nov 5, 2025Updated 6 months ago
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆23Jun 30, 2025Updated 10 months ago
- ☆1,458Jun 18, 2024Updated last year
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,918May 17, 2025Updated 11 months ago
- Evaluation framework for document processing models and services.☆70Apr 27, 2026Updated last week
- Fast Multimodal Semantic Deduplication & Filtering☆921Updated this week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,613Dec 20, 2025Updated 4 months ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆32Sep 19, 2025Updated 7 months ago
- Chunk your text using gpt4o-mini more accurately☆44Aug 3, 2024Updated last year
- Supercharge Your LLM Application Evaluations 🚀☆13,785Feb 24, 2026Updated 2 months ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆160Jul 14, 2025Updated 9 months ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆847Jan 28, 2025Updated last year
- Convert PowerPoint files into semantically rich text using vision language models☆113Nov 12, 2025Updated 5 months ago
- ☆231Jan 18, 2026Updated 3 months ago
- Smart reproducible analytical pipeline inspection☆21Feb 13, 2026Updated 2 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Library for evaluating RAG using Nuclia's models☆18Jul 31, 2024Updated last year
- Using RAG to generate data for model fine-tuning.☆13Apr 16, 2025Updated last year
- ☆47Feb 7, 2024Updated 2 years ago
- Code used to create text embeddings of all Magic: The Gathering cards.☆61Feb 24, 2025Updated last year
- Developer APIs to Accelerate LLM Projects☆1,748Oct 18, 2024Updated last year
- Codes and packages for the paper titled Evaluating Retrieval Quality in Retrieval-Augmented Generation.☆31May 21, 2025Updated 11 months ago
- This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. Each technique has a detailed not…☆27,098Apr 15, 2026Updated 3 weeks ago
- Interpolate between embedding points with llm☆38Jul 17, 2024Updated last year
- Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.☆1,286Apr 11, 2026Updated 3 weeks ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Python library to use Pleias-RAG models☆71May 1, 2025Updated last year
- A RAG that can scale 🧑🏻💻☆11May 28, 2024Updated last year
- Fast, Accurate, Lightweight Python library to make State of the Art Embedding☆2,917Apr 21, 2026Updated 2 weeks ago
- structured outputs for llms☆12,889Apr 22, 2026Updated 2 weeks ago
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cro…☆965Jan 1, 2026Updated 4 months ago
- Superlinked Inference Engine is an Open-source inference server and production cluster for embeddings, reranking, and extraction.☆1,713Apr 29, 2026Updated last week
- Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…☆14,646Updated this week