Systemcluster / kitokenLinks

Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tokenization in JavaScript, Python and Rust.

☆26

Alternatives and similar repositories for kitoken

Users that are interested in kitoken are comparing it to the libraries listed below

Sorting:

beowolx / rensa
High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datas…
☆187Updated 3 weeks ago
Dan-wanna-M / kbnf
A high-performance constrained decoding engine based on context free grammar in Rust
☆54Updated last month
ashvardanian / jaccard-index
Optimizing bit-level Jaccard Index and Population Counts for large-scale quantized Vector Search via Harley-Seal CSA and Lookup Tables
☆20Updated last month
serega / gaoya
Locality Sensitive Hashing
☆72Updated 2 years ago
Narsil / bloomserver
☆39Updated 2 years ago
fbilhaut / gline-rs
Inference engine for GLiNER models, in Rust
☆61Updated last week
iamlemec / bert.cpp
GGML implementation of BERT model with Python bindings and quantization.
☆55Updated last year
michaelfeil / candle-flash-attn-v3
☆11Updated 5 months ago
chenwanqq / candle-llava
implement llava using candle
☆15Updated last year
yaman / fashion-clip-rs
A complete(grpc service and lib) Rust inference with multilingual embedding support. This version leverages the power of Rust for both GR…
☆39Updated 10 months ago
jquesnelle / ctranslate2-rs
Rust bindings for CTranslate2
☆14Updated 2 years ago
MinishLab / tokenlearn
Pre-train Static Word Embeddings
☆84Updated last month
DeployQL / LintDB
Vector Database with support for late interaction and token level embeddings.
☆55Updated 3 weeks ago
facebookresearch / vector_db_id_compression
Implementation of the paper "Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search" by Severo et al.
☆80Updated 5 months ago
kpu / fasterText
Library for fast text representation and classification.
☆30Updated last year
raphaelsty / LeNLP
NLP with Rust for Python 🦀🐍
☆63Updated 2 months ago
kyutai-labs / kaudio
Rust crate for some audio utilities
☆26Updated 4 months ago
RAIVNLab / AdANNS
Code repository for the paper - "AdANNS: A Framework for Adaptive Semantic Search"
☆65Updated last year
trapoom555 / Language-Model-STS-CFT
Improving Text Embedding of Language Models Using Contrastive Fine-tuning
☆64Updated 11 months ago
Knowledgator / TurboT5
Truly flash T5 realization!
☆68Updated last year
mixedbread-ai / binary-embeddings
Showcase how mxbai-embed-large-v1 can be used to produce binary embedding. Binary embeddings enabled 32x storage savings and 40x faster r…
☆18Updated last year
cortexlabs / nucleus
Cortex-compatible model server for Python and TensorFlow
☆17Updated 2 years ago
raphaelsty / neural-tree
Tree-based indexes for neural-search
☆32Updated last year
kyutai-labs / moshi-webrtc
Proof of concept for running moshi/hibiki using webrtc
☆20Updated 4 months ago
jlscheerer / xtr-warp
XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.
☆137Updated 2 months ago
huggingface / ember
ANE accelerated embedding models!
☆18Updated 7 months ago
zhao-lang / redis_hnsw
HSNW module for Redis
☆57Updated 4 years ago
AnswerDotAI / fastkmeans
☆62Updated last week
Pleias / Pleias-RAG-Library
Python library to use Pleias-RAG models
☆58Updated 2 months ago
castorini / hf-spacerini
Plug-and-play Search Interfaces with Pyserini and Hugging Face
☆32Updated last year