benbrandt / text-splitter
Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python.
β317Updated this week
Alternatives and similar repositories for text-splitter:
Users that are interested in text-splitter are comparing it to the libraries listed below
- Rust library for generating vector embeddings, reranking locallyβ402Updated this week
- Production-Ready Inference, Ingestion and Indexing built in Rust π¦β408Updated this week
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.β224Updated last week
- LLM Orchestrator built in Rustβ267Updated 10 months ago
- Structured generation in Rustβ168Updated this week
- Fast, streaming indexing, query, and agent library for building LLM applications in Rustβ352Updated this week
- Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.β290Updated last week
- β167Updated this week
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,246Updated this week
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on taskβ¦β140Updated 3 months ago
- Code for explaining and evaluating late chunking (chunked pooling)β311Updated 3 weeks ago
- β198Updated last month
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, impβ¦β170Updated 4 months ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.β401Updated 11 months ago
- Neural search for web-sites, docs, articles - online!β130Updated 2 months ago
- β136Updated 11 months ago
- This package, developed as part of our research detailed in the Chroma Technical Report, provides tools for text chunking and evaluation.β¦β208Updated 3 months ago
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has stβ¦β206Updated last week
- A realtime serving engine for Data-Intensive Generative AI Applicationsβ951Updated this week
- β195Updated 8 months ago
- Extract structured text from pdfs quicklyβ383Updated this week
- The Easiest Rust Interface for Local LLMs and an Interface for Deterministic Signals from Probabilistic LLM Vibesβ163Updated this week
- β119Updated last month
- Fast, Accurate, Lightweight Python library to make State of the Art Embeddingβ1,675Updated this week
- AgentSearch is a framework for powering search agents and enabling customizable local search.β462Updated 8 months ago
- Retrieve, Read and LinK: Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget (ACL 2024)β368Updated 3 months ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.β694Updated 2 months ago
- A simple Python sandbox for helpful LLM data agentsβ210Updated 7 months ago
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and croβ¦β722Updated last month
- β206Updated 6 months ago