Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers, Tiktoken and more. Supports BPE, Unigram and WordPiece tokenization in JavaScript, Python and Rust.
☆44Oct 10, 2025Updated 4 months ago
Alternatives and similar repositories for kitoken
Users that are interested in kitoken are comparing it to the libraries listed below
Sorting:
- PHP low-level client for Vespa. https://vespa.ai/☆17Jan 22, 2026Updated last month
- gRPC server for hnswlib☆16Mar 6, 2023Updated 2 years ago
- Normalize text string☆12Nov 6, 2018Updated 7 years ago
- zero-vocab or low-vocab embeddings☆18Jul 17, 2022Updated 3 years ago
- ☆21Apr 16, 2024Updated last year
- Model implementation for the contextual embeddings project☆40Jun 2, 2025Updated 8 months ago
- RATransformers 🐭- Make your transformer (like BERT, RoBERTa, GPT-2 and T5) Relation Aware!☆42Dec 14, 2022Updated 3 years ago
- Blazingly fast Markdown parser for Python written in Rust.☆39Updated this week
- ☆13Nov 15, 2017Updated 8 years ago
- A memory allocator that aims to eliminate dangling pointer vulnerabilities at a low overhead, using virtualisation via Dune. My Computer …☆10Nov 27, 2019Updated 6 years ago
- rabitq rust implementation☆10Feb 4, 2026Updated 3 weeks ago
- Statistical discontinuous constituent parsing☆11Feb 15, 2018Updated 8 years ago
- Narwhal is a keyword and KEY NARRATIVE manager that creates language-aware classes. Because Narhwal does not use NLP it avoids complexity…☆12Oct 16, 2018Updated 7 years ago
- Walks through building different HTML5 layouts for AV systems☆12Oct 15, 2021Updated 4 years ago
- Pure D implementation of SHA-3 (Keccak-f[1600,24]) + DUB package☆12Sep 15, 2025Updated 5 months ago
- TSDG: An efficient index graph for graph-based nearest neighbor search☆10Jul 14, 2022Updated 3 years ago
- MSVBASE is a system that efficiently supports complex queries of both approximate similarity search and relational operators. It integrat…☆103Nov 19, 2024Updated last year
- ☆51Jun 21, 2025Updated 8 months ago
- Lazy reading of file objects for efficient batch processing☆10Sep 6, 2017Updated 8 years ago
- simplify the prediction process for a finetuned bert model☆11Jun 19, 2019Updated 6 years ago
- An R package to convert SingeCellExperiment and Seurat objects into anndata as comprehensively as possible.☆11Apr 23, 2025Updated 10 months ago
- 🏆 The winner code for Neurips'23 BigANN Competition OOD and Sparse track.☆14Jun 17, 2025Updated 8 months ago
- Yet Another SEquence Tagger☆10Dec 8, 2022Updated 3 years ago
- Distributed hash-table node☆12Oct 2, 2023Updated 2 years ago
- Question generation from text☆15Sep 19, 2012Updated 13 years ago
- Simple and clean Python implementation of TextRank as per seminal paper by Rada Mihalcea and Paul Tarau. This implementation performs bot…☆11Jan 26, 2021Updated 5 years ago
- ☆13Dec 9, 2024Updated last year
- ☆12Jan 15, 2019Updated 7 years ago
- TextMate support for D☆13Feb 22, 2024Updated 2 years ago
- LBFGS optimization algorithm ported from liblbfgs☆12Nov 25, 2022Updated 3 years ago
- ☆11Jun 1, 2024Updated last year
- A PyTorch implementation of SimSiam based on CVPR 2021 paper "Exploring Simple Siamese Representation Learning"☆12Mar 23, 2021Updated 4 years ago
- Logging utilities that aimed to be used in highly loaded applications☆12Nov 10, 2017Updated 8 years ago
- Conditional Random Fields implemented as Lasagne layer☆10Jul 22, 2016Updated 9 years ago
- NLP2025 のチュートリアル「地理情報と言語処理 実践入門」の資料とソースコード☆17Feb 20, 2026Updated last week
- Frontend (and soon also midleware and backend) for a new, opensource image generation platform.☆14Nov 5, 2022Updated 3 years ago
- mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models☆11Jan 19, 2024Updated 2 years ago
- Server wrapper for ml models☆11Sep 11, 2019Updated 6 years ago
- Source code for "N-ary Constituent Tree Parsing with Recursive Semi-Markov Model" published at ACL 2021☆10May 27, 2021Updated 4 years ago