This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation. It allows users to compare different chunking methods and includes implementations of several novel chunking strategies.
☆489Dec 13, 2025Updated 5 months ago
Alternatives and similar repositories for chunking_evaluation
Users that are interested in chunking_evaluation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆34Jun 17, 2024Updated last year
- Applying domain specific evaluations to RAG chunking and embedding functions☆18Dec 25, 2024Updated last year
- An overview of popular reranking models and architectures for 2 stage RAG pipelines☆21Jun 10, 2025Updated 11 months ago
- Fast BM25 search in Python, powered by Numpy and Numba☆1,690May 18, 2026Updated last week
- ☆22Oct 14, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Model implementation for the contextual embeddings project☆47Jun 2, 2025Updated 11 months ago
- ☆12Apr 27, 2026Updated last month
- Code for explaining and evaluating late chunking (chunked pooling)☆516Dec 23, 2024Updated last year
- Fork of OpenAI's Realtime Console, adapted for Vocal RAG☆36Oct 18, 2024Updated last year
- ☆21Nov 26, 2024Updated last year
- Optimize Document Retrieval with Fine-Tuned KnowledgeBases☆185Nov 5, 2025Updated 6 months ago
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆23Jun 30, 2025Updated 11 months ago
- ☆1,464Jun 18, 2024Updated last year
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,932May 17, 2025Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Evaluation framework for document processing models and services.☆73May 15, 2026Updated 2 weeks ago
- Fast Multimodal Semantic Deduplication & Filtering☆933Updated this week
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,615Dec 20, 2025Updated 5 months ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆32Sep 19, 2025Updated 8 months ago
- Chunk your text using gpt4o-mini more accurately☆44Aug 3, 2024Updated last year
- Supercharge Your LLM Application Evaluations 🚀☆14,123Feb 24, 2026Updated 3 months ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆160Jul 14, 2025Updated 10 months ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆847Jan 28, 2025Updated last year
- Convert PowerPoint files into semantically rich text using vision language models☆113Nov 12, 2025Updated 6 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆237Jan 18, 2026Updated 4 months ago
- This repository presents the original implementation of LumberChunker: Long-Form Narrative Document Segmentation by André V. Duarte, João…☆107Feb 9, 2026Updated 3 months ago
- Smart reproducible analytical pipeline inspection☆21Feb 13, 2026Updated 3 months ago
- Library for evaluating RAG using Nuclia's models☆18Jul 31, 2024Updated last year
- helix-db python lib + workflows☆35Nov 12, 2025Updated 6 months ago
- Using RAG to generate data for model fine-tuning.☆14Apr 16, 2025Updated last year
- ☆47Feb 7, 2024Updated 2 years ago
- Code used to create text embeddings of all Magic: The Gathering cards.☆62Feb 24, 2025Updated last year
- Developer APIs to Accelerate LLM Projects☆1,749Oct 18, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. Each technique has a detailed not…☆27,528Updated this week
- Interpolate between embedding points with llm☆38Jul 17, 2024Updated last year
- Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.☆1,293Apr 11, 2026Updated last month
- Python library to use Pleias-RAG models☆71May 8, 2026Updated 3 weeks ago
- A RAG that can scale 🧑🏻💻☆11May 28, 2024Updated 2 years ago
- Fast tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tok…☆49May 10, 2026Updated 2 weeks ago
- structured outputs for llms☆13,023Updated this week