This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation. It allows users to compare different chunking methods and includes implementations of several novel chunking strategies.
☆481Dec 13, 2025Updated 3 months ago
Alternatives and similar repositories for chunking_evaluation
Users that are interested in chunking_evaluation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆33Jun 17, 2024Updated last year
- Fast lexical search implementing BM25 in Python☆1,596Mar 17, 2026Updated last week
- ☆22Oct 14, 2024Updated last year
- Model implementation for the contextual embeddings project☆43Jun 2, 2025Updated 9 months ago
- Code for explaining and evaluating late chunking (chunked pooling)☆495Dec 23, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ☆1,447Jun 18, 2024Updated last year
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆22Jun 30, 2025Updated 9 months ago
- Evaluation framework for document processing models and services.☆67Updated this week
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,889May 17, 2025Updated 10 months ago
- Fast Multimodal Semantic Deduplication & Filtering☆906Jan 20, 2026Updated 2 months ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,605Dec 20, 2025Updated 3 months ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆32Sep 19, 2025Updated 6 months ago
- Chunk your text using gpt4o-mini more accurately☆44Aug 3, 2024Updated last year
- Supercharge Your LLM Application Evaluations 🚀☆13,106Feb 24, 2026Updated last month
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆160Jul 14, 2025Updated 8 months ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆845Jan 28, 2025Updated last year
- This repository presents the original implementation of LumberChunker: Long-Form Narrative Document Segmentation by André V. Duarte, João…☆94Feb 9, 2026Updated last month
- ☆226Jan 18, 2026Updated 2 months ago
- Smart reproducible analytical pipeline inspection☆21Feb 13, 2026Updated last month
- Library for evaluating RAG using Nuclia's models☆18Jul 31, 2024Updated last year
- Using RAG to generate data for model fine-tuning.☆13Apr 16, 2025Updated 11 months ago
- Developer APIs to Accelerate LLM Projects☆1,749Oct 18, 2024Updated last year
- This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information r…☆26,335Feb 17, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Codes and packages for the paper titled Evaluating Retrieval Quality in Retrieval-Augmented Generation.☆31May 21, 2025Updated 10 months ago
- Interpolate between embedding points with llm☆38Jul 17, 2024Updated last year
- Python library to use Pleias-RAG models☆70May 1, 2025Updated 10 months ago
- Fast tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tok…☆48Updated this week
- A RAG that can scale 🧑🏻💻☆11May 28, 2024Updated last year
- structured outputs for llms☆12,589Updated this week
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cro…☆956Jan 1, 2026Updated 2 months ago
- Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…☆14,340Updated this week
- Implementation of StrongDM's Attractor spec (https://github.com/strongdm/attractor) in Rust☆26Mar 9, 2026Updated 3 weeks ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ☆39Nov 21, 2022Updated 3 years ago
- ☆12Dec 8, 2022Updated 3 years ago
- ☆198May 5, 2024Updated last year
- Fast, Accurate, Lightweight Python library to make State of the Art Embedding☆2,815Updated this week
- Benchmarking library for RAG☆263Mar 11, 2026Updated 2 weeks ago
- SPLADE: sparse neural search (SIGIR21, SIGIR22)☆984May 3, 2024Updated last year
- A new chunking strategy developed by ZeroEntropy for general semantic chunking using Llama-70B.☆254Jan 28, 2025Updated last year