Code for explaining and evaluating late chunking (chunked pooling)
☆516Dec 23, 2024Updated last year
Alternatives and similar repositories for late-chunking
Users that are interested in late-chunking are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, imp…☆210Aug 31, 2024Updated last year
- Fast BM25 search in Python, powered by Numpy and Numba☆1,690May 18, 2026Updated last week
- Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]☆89Jan 18, 2025Updated last year
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,615Dec 20, 2025Updated 5 months ago
- Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception☆274Sep 25, 2025Updated 8 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- This repository presents the original implementation of LumberChunker: Long-Form Narrative Document Segmentation by André V. Duarte, João…☆107Feb 9, 2026Updated 3 months ago
- Code for KaLM-Embedding models☆118Jun 30, 2025Updated 10 months ago
- Official Implementation of "Multi-Head RAG: Solving Multi-Aspect Problems with LLMs"☆242Feb 26, 2026Updated 3 months ago
- Query Expension for Better Query Embedding using LLMs☆69Feb 18, 2025Updated last year
- ☆24Jan 30, 2025Updated last year
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆160Jul 14, 2025Updated 10 months ago
- Late Interaction Models Training & Retrieval☆821May 22, 2026Updated last week
- [EMNLP 2024: Demo Oral] RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation☆308Oct 18, 2024Updated last year
- High-performance retrieval engine for unstructured data☆1,583Nov 10, 2025Updated 6 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Retrieval and Retrieval-augmented LLMs☆11,722Apr 22, 2026Updated last month
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆847Jan 28, 2025Updated last year
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,932May 17, 2025Updated last year
- [ACL 2025] AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark☆165Mar 29, 2026Updated 2 months ago
- A blazing fast inference solution for text embeddings models☆4,826Updated this week
- This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation.…☆489Dec 13, 2025Updated 5 months ago
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.☆2,637May 19, 2026Updated last week
- ⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)☆3,496Apr 10, 2026Updated last month
- The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval☆1,680Sep 3, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numba☆38Oct 16, 2025Updated 7 months ago
- RAGChecker: A Fine-grained Framework For Diagnosing RAG☆1,088Dec 13, 2024Updated last year
- The code for paper: Hierarchical Document Refinement for Long-context Retrieval-augmented Generation [ACL2025 Oral]☆46Aug 25, 2025Updated 9 months ago
- Simply, faster, sentence-transformers☆144Aug 27, 2024Updated last year
- TAG-Bench: A benchmark for table-augmented generation (TAG)☆768Apr 4, 2025Updated last year
- A simple, easy-to-hack GraphRAG implementation☆3,856Jan 27, 2026Updated 4 months ago
- Small python package to measure OCR quality and other related metrics.☆26Feb 19, 2024Updated 2 years ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆632May 6, 2026Updated 3 weeks ago
- Empowering RAG with a memory-based data interface for all-purpose applications!☆2,243Sep 11, 2025Updated 8 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆3,229May 18, 2026Updated last week
- Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…☆14,749May 18, 2026Updated last week
- Fast, Accurate, Lightweight Python library to make State of the Art Embedding☆2,973Updated this week
- Knowledge Agents and Management in the Cloud☆4,251May 18, 2026Updated last week
- [NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention…☆1,214Apr 8, 2026Updated last month
- AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation☆4,784May 19, 2026Updated last week
- ☆150Jul 19, 2024Updated last year