This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation. It allows users to compare different chunking methods and includes implementations of several novel chunking strategies.
☆493Dec 13, 2025Updated 6 months ago
Alternatives and similar repositories for chunking_evaluation
Users that are interested in chunking_evaluation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An Overview of the Latest Document Chunking Research☆91Nov 25, 2024Updated last year
- ☆34Jun 17, 2024Updated 2 years ago
- Applying domain specific evaluations to RAG chunking and embedding functions☆18Dec 25, 2024Updated last year
- An overview of popular reranking models and architectures for 2 stage RAG pipelines☆22Jun 10, 2025Updated last year
- Fast BM25 search in Python, powered by Numpy and Numba☆1,715Jun 11, 2026Updated 2 weeks ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Query Only Linear Adapter Training for Fine Tuned Embedding Model Query Representation☆28Sep 12, 2024Updated last year
- ☆22Oct 14, 2024Updated last year
- Model implementation for the contextual embeddings project☆47Jun 2, 2025Updated last year
- ☆12Apr 27, 2026Updated 2 months ago
- Code for explaining and evaluating late chunking (chunked pooling)☆525Dec 23, 2024Updated last year
- Fork of OpenAI's Realtime Console, adapted for Vocal RAG☆36Oct 18, 2024Updated last year
- ☆21Nov 26, 2024Updated last year
- Optimize Document Retrieval with Fine-Tuned KnowledgeBases☆186Nov 5, 2025Updated 7 months ago
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆24Jun 30, 2025Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆1,470Jun 18, 2024Updated 2 years ago
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,938May 17, 2025Updated last year
- Evaluation framework for document processing models and services.☆76May 28, 2026Updated last month
- Fast Multimodal Semantic Deduplication & Filtering☆940May 24, 2026Updated last month
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,621Dec 20, 2025Updated 6 months ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆32Sep 19, 2025Updated 9 months ago
- Chunk your text using gpt4o-mini more accurately☆44Aug 3, 2024Updated last year
- Supercharge Your LLM Application Evaluations 🚀☆14,523Feb 24, 2026Updated 4 months ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆161Jul 14, 2025Updated 11 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆850Jan 28, 2025Updated last year
- Convert PowerPoint files into semantically rich text using vision language models☆113Nov 12, 2025Updated 7 months ago
- ☆241Jan 18, 2026Updated 5 months ago
- This repository presents the original implementation of LumberChunker: Long-Form Narrative Document Segmentation by André V. Duarte, João…☆108Feb 9, 2026Updated 4 months ago
- Smart reproducible analytical pipeline inspection☆21Feb 13, 2026Updated 4 months ago
- Library for evaluating RAG using Nuclia's models☆18Jul 31, 2024Updated last year
- Using RAG to generate data for model fine-tuning.☆14Apr 16, 2025Updated last year
- ☆47Feb 7, 2024Updated 2 years ago
- Developer APIs to Accelerate LLM Projects☆1,747Oct 18, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Codes and packages for the paper titled Evaluating Retrieval Quality in Retrieval-Augmented Generation.☆32May 21, 2025Updated last year
- This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. Each technique has a detailed not…☆28,225Jun 17, 2026Updated last week
- Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.☆1,306Apr 11, 2026Updated 2 months ago
- Python library to use Pleias-RAG models☆72Jun 20, 2026Updated last week
- A RAG that can scale 🧑🏻 💻☆11May 28, 2024Updated 2 years ago
- Fast tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tok…☆55May 10, 2026Updated last month
- Fast, Accurate, Lightweight Python library to make State of the Art Embedding☆3,058Jun 23, 2026Updated last week