chonkie-inc / chonkie
π¦ CHONK your texts with Chonkie β¨ - The no-nonsense chunking library
β284Updated this week
Alternatives and similar repositories for chonkie:
Users that are interested in chonkie are comparing it to the libraries listed below
- OCR Benchmarkβ464Updated last week
- π€ Benchmark Large Language Models Reliably On Your Dataβ240Updated last week
- Fast Semantic Text Deduplicationβ638Updated this week
- LettuceDetect is a hallucination detection framework for RAG applications.β385Updated 2 weeks ago
- Deep Research for your internal dataβ310Updated this week
- Structured information extraction from documentsβ313Updated 6 months ago
- Unlock 650+ MCP servers tools in your favorite agentic framework.β296Updated last week
- Kura is a simple reproduction of the CLIO paper which uses language models to label user behaviour before clustering them based on embeddβ¦β100Updated 2 weeks ago
- Semantic Chunker is a lightweight Python package for semantically-aware chunking and clustering of text.β131Updated last week
- β221Updated 6 months ago
- β121Updated last week
- A Lightweight Library for AI Observabilityβ241Updated 2 months ago
- π PageIndex: Document Index System for Reasoning-Based RAGβ487Updated this week
- A simple Python program to implement the search-extract-summarize flow.β260Updated 3 months ago
- A fully customizable and self-hosted sandboxing solution for AI agent code execution and computer use. It features out-of-the-box supportβ¦β354Updated last week
- Fast State-of-the-Art Static Embeddingsβ1,359Updated this week
- A new chunking strategy developed by ZeroEntropy for general semantic chunking using Llama-70B.β175Updated 2 months ago
- Fully neural approach for text chunkingβ319Updated this week
- A simple tool that let's you explore different possible paths that an LLM might sample.β163Updated last week
- β92Updated 4 months ago
- HawkinsDB is our take on giving AI systems a more human-like way to store and recall information, inspired by how our own brains work. Baβ¦β168Updated 3 months ago
- Together Open Deep Researchβ220Updated last week
- II-Researcher: a new open-source framework designed to aid building search / research agentsβ240Updated this week
- Prompt engineering, automated.β301Updated this week
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. π¨π»βπ³β271Updated last week
- β428Updated last week
- See Through Your Modelsβ381Updated last month
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.β775Updated 2 months ago
- Sidecar is the AI brains for the Aide editor and works alongside it, locally on your machineβ542Updated 2 weeks ago
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has stβ¦β892Updated 2 months ago