Systemcluster / kitoken

Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tokenization in JavaScript, Python and Rust.
19Updated last week

Alternatives and similar repositories for kitoken:

Users that are interested in kitoken are comparing it to the libraries listed below