This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation. It allows users to compare different chunking methods and includes implementations of several novel chunking strategies.
☆476Dec 13, 2025Updated 2 months ago
Alternatives and similar repositories for chunking_evaluation
Users that are interested in chunking_evaluation are comparing it to the libraries listed below
Sorting:
- ☆33Jun 17, 2024Updated last year
- ☆21Oct 14, 2024Updated last year
- Model implementation for the contextual embeddings project☆41Jun 2, 2025Updated 9 months ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,507Feb 17, 2026Updated 2 weeks ago
- Evaluation framework for document processing models and services.☆65Feb 12, 2026Updated 3 weeks ago
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,868May 17, 2025Updated 9 months ago
- Fast Multimodal Semantic Deduplication & Filtering☆892Jan 20, 2026Updated last month
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,602Dec 20, 2025Updated 2 months ago
- Code for explaining and evaluating late chunking (chunked pooling)☆490Dec 23, 2024Updated last year
- Smart reproducible analytical pipeline inspection☆21Feb 13, 2026Updated 3 weeks ago
- ☆1,442Jun 18, 2024Updated last year
- Query Only Linear Adapter Training for Fine Tuned Embedding Model Query Representation☆28Sep 12, 2024Updated last year
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆22Jun 30, 2025Updated 8 months ago
- Supercharge Your LLM Application Evaluations 🚀☆12,826Feb 24, 2026Updated last week
- ☆198May 5, 2024Updated last year
- ☆221Jan 18, 2026Updated last month
- Library for evaluating RAG using Nuclia's models☆18Jul 31, 2024Updated last year
- Chunk your text using gpt4o-mini more accurately☆44Aug 3, 2024Updated last year
- Optimize Document Retrieval with Fine-Tuned KnowledgeBases☆183Nov 5, 2025Updated 4 months ago
- Developer APIs to Accelerate LLM Projects☆1,744Oct 18, 2024Updated last year
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆32Sep 19, 2025Updated 5 months ago
- Python library to use Pleias-RAG models☆68May 1, 2025Updated 10 months ago
- ☆47Feb 7, 2024Updated 2 years ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆845Jan 28, 2025Updated last year
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cro…☆947Jan 1, 2026Updated 2 months ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆159Jul 14, 2025Updated 7 months ago
- structured outputs for llms☆12,468Feb 25, 2026Updated last week
- Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.☆1,251Feb 26, 2026Updated last week
- ☆16Jul 13, 2024Updated last year
- Benchmarking library for RAG☆261Feb 15, 2026Updated 3 weeks ago
- SPLADE: sparse neural search (SIGIR21, SIGIR22)☆980May 3, 2024Updated last year
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024☆2,895Updated this week
- This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information r…☆25,832Feb 17, 2026Updated 2 weeks ago
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali☆2,703Feb 5, 2026Updated last month
- Fast, Accurate, Lightweight Python library to make State of the Art Embedding☆2,759Updated this week
- Fork of OpenAI's Realtime Console, adapted for Vocal RAG☆36Oct 18, 2024Updated last year
- Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured…☆1,500Dec 10, 2025Updated 2 months ago
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.☆161Apr 3, 2024Updated last year
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆574Feb 14, 2026Updated 3 weeks ago