benbrandt / text-splitter
Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python.
☆290Updated this week
Related projects ⓘ
Alternatives and complementary repositories for text-splitter
- A minimalist yet highly performant, lightweight, lightning fast, multisource, multimodal and local embedding solution, built in rust.☆292Updated this week
- ☆180Updated last week
- Neural search for web-sites, docs, articles - online!☆128Updated 3 weeks ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆133Updated last month
- A fast and lightweight pure Python library for splitting text into semantically meaningful chunks.☆182Updated 4 months ago
- Library for generating vector embeddings, reranking in Rust☆285Updated this week
- Fast, streaming indexing and query library for AI (RAG) applications, written in Rust☆257Updated this week
- Structured generation in Rust☆128Updated this week
- Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.☆265Updated last month
- LLM Orchestrator built in Rust☆267Updated 8 months ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,095Updated last week
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆385Updated 9 months ago
- A Python library to chunk/group your texts based on semantic similarity.☆85Updated 4 months ago
- Code for explaining and evaluating late chunking (chunked pooling)☆246Updated last month
- This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation.…☆160Updated last month
- ☆23Updated 5 months ago
- ☆162Updated 3 weeks ago
- A simple Python sandbox for helpful LLM data agents☆170Updated 5 months ago
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, imp…☆162Updated 2 months ago
- Formatron empowers everyone to control the format of language models' output with minimal overhead.☆160Updated last week
- ☆204Updated 4 months ago
- A cross-platform browser ML framework.☆623Updated this week
- ☆136Updated 9 months ago
- Efficient vector database for hundred millions of embeddings.☆200Updated 6 months ago
- 🦜💯 Flex those feathers!☆234Updated 3 weeks ago
- 🦀 A curated list of Rust tools, libraries, and frameworks for working with LLMs, GPT, AI☆286Updated 8 months ago
- Domain Adapted Language Modeling Toolkit - E2E RAG☆311Updated last week
- Solving data for LLMs - Create quality synthetic datasets!☆137Updated last month
- Extract structured text from pdfs quickly☆340Updated 3 weeks ago
- A realtime serving engine for Data-Intensive Generative AI Applications☆914Updated this week