brandonstarxel / chunking_evaluationView external linksLinks
This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation. It allows users to compare different chunking methods and includes implementations of several novel chunking strategies.
☆472Dec 13, 2025Updated 2 months ago
Alternatives and similar repositories for chunking_evaluation
Users that are interested in chunking_evaluation are comparing it to the libraries listed below
Sorting:
- ☆32Jun 17, 2024Updated last year
- Applying domain specific evaluations to RAG chunking and embedding functions☆18Dec 25, 2024Updated last year
- ☆21Oct 14, 2024Updated last year
- Model implementation for the contextual embeddings project☆40Jun 2, 2025Updated 8 months ago
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,477Feb 4, 2026Updated last week
- Evaluation framework for document processing models and services.☆63Updated this week
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,852May 17, 2025Updated 8 months ago
- Fast Multimodal Semantic Deduplication & Filtering☆886Jan 20, 2026Updated 3 weeks ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,594Dec 20, 2025Updated last month
- Code for explaining and evaluating late chunking (chunked pooling)☆487Dec 23, 2024Updated last year
- Smart reproducible analytical pipeline inspection☆21Jan 4, 2026Updated last month
- ☆1,439Jun 18, 2024Updated last year
- ☆215Jan 18, 2026Updated 3 weeks ago
- ☆198May 5, 2024Updated last year
- Library for evaluating RAG using Nuclia's models☆18Jul 31, 2024Updated last year
- Chunk your text using gpt4o-mini more accurately☆44Aug 3, 2024Updated last year
- Codes and packages for the paper titled Evaluating Retrieval Quality in Retrieval-Augmented Generation.☆30May 21, 2025Updated 8 months ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆32Sep 19, 2025Updated 4 months ago
- Developer APIs to Accelerate LLM Projects☆1,741Oct 18, 2024Updated last year
- Python library to use Pleias-RAG models☆68May 1, 2025Updated 9 months ago
- ☆47Feb 7, 2024Updated 2 years ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆843Jan 28, 2025Updated last year
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cro…☆938Jan 1, 2026Updated last month
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆156Jul 14, 2025Updated 6 months ago
- structured outputs for llms☆12,357Updated this week
- ☆12Feb 6, 2026Updated last week
- Chat Markup Language conversation library☆55Jan 3, 2024Updated 2 years ago
- SPLADE: sparse neural search (SIGIR21, SIGIR22)☆977May 3, 2024Updated last year
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024☆2,806Updated this week
- This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information r…☆24,625Feb 3, 2026Updated last week
- Fast, Accurate, Lightweight Python library to make State of the Art Embedding☆2,703Jan 9, 2026Updated last month
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali☆2,661Feb 5, 2026Updated last week
- Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured…☆1,489Dec 10, 2025Updated 2 months ago
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.☆160Apr 3, 2024Updated last year
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆560Oct 28, 2025Updated 3 months ago
- A practical, explainable and effective method for reducing bias in machine learning algorithms.☆22May 17, 2020Updated 5 years ago
- A curated collection of awesome applications and tools that utilize large language models (LLMs) with retrieval-augmented generation (RAG…☆16Dec 25, 2025Updated last month
- A library for structural-semantic chunking of documents.☆12Oct 8, 2025Updated 4 months ago
- A RAG that can scale 🧑🏻💻☆11May 28, 2024Updated last year