benbrandt / text-splitterLinks

Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python.

☆456

Alternatives and similar repositories for text-splitter

Users that are interested in text-splitter are comparing it to the libraries listed below

Sorting:

StarlightSearch / EmbedAnything
Production-ready Inference, Ingestion and Indexing built in Rust 🦀
☆656Updated last week
Anush008 / fastembed-rs
Rust library for generating vector embeddings, reranking.
☆559Updated 2 weeks ago
santiagomed / orca
LLM Orchestrator built in Rust
☆281Updated last year
EricLBuehler / candle-vllm
Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.
☆397Updated 2 weeks ago
dottxt-ai / outlines-core
Faster structured generation
☆234Updated 2 months ago
isaacus-dev / semchunk
A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.
☆344Updated last month
ShelbyJenkins / llm_client
The Easiest Rust Interface for Local LLMs and an Interface for Deterministic Signals from Probabilistic LLM Vibes
☆207Updated 5 months ago
guidance-ai / llguidance
Super-fast Structured Outputs
☆337Updated last week
qdrant / page-search
Neural search for web-sites, docs, articles - online!
☆135Updated 2 months ago
bosun-ai / swiftide
Fast, streaming indexing, query, and agentic LLM applications in Rust
☆517Updated this week
beowolx / rensa
High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datas…
☆190Updated last week
qdrant / rust-client
Rust client for Qdrant vector search engine
☆309Updated last week
aurelio-labs / semantic-chunkers
☆231Updated last month
jondot / awesome-rust-llm
🦀 A curated list of Rust tools, libraries, and frameworks for working with LLMs, GPT, AI
☆447Updated last year
fbilhaut / gline-rs
Inference engine for GLiNER models, in Rust
☆64Updated 3 weeks ago
huggingface / ratchet
A cross-platform browser ML framework.
☆709Updated 8 months ago
edwinkys / oasysdb
In-memory vector store with efficient read and write performance for semantic caching and retrieval system. Redis for Semantic Caching.
☆366Updated 7 months ago
huggingface / hf-hub
Rust client for the huggingface hub aiming for minimal subset of features over `huggingface-hub` python package
☆218Updated last month
qdrant / qdrant-web-ui
Self-hosted web UI for Qdrant
☆299Updated last week
tensorlakeai / indexify
A realtime serving engine for Data-Intensive Generative AI Applications
☆1,036Updated this week
edgenai / llama_cpp-rs
High-level, optionally asynchronous Rust bindings to llama.cpp
☆224Updated last year
utilityai / llama-cpp-rs
☆328Updated this week
AmineDiro / cria
OpenAI compatible API for serving LLAMA-2 model
☆218Updated last year
agamm / semantic-split
A Python library to chunk/group your texts based on semantic similarity.
☆97Updated last year
qdrant / fastembed
Fast, Accurate, Lightweight Python library to make State of the Art Embedding
☆2,236Updated this week
tyrchen / qdrant-lib
Extract core logic from qdrant and make it available as a library.
☆60Updated last year
AnswerDotAI / rerankers
A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.
☆1,503Updated last month
cpcdoy / rust-sbert
Rust port of sentence-transformers (https://github.com/UKPLab/sentence-transformers)
☆117Updated 10 months ago
zurawiki / tiktoken-rs
Ready-made tokenizer library for working with GPT and tiktoken
☆325Updated this week
PrithivirajDamodaran / FlashRank
Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cro…
☆835Updated 3 weeks ago