π₯ Fast State-of-the-Art Tokenizers optimized for Research and Production
β10,485Updated this week
Alternatives and similar repositories for tokenizers
Users that are interested in tokenizers are comparing it to the libraries listed below
Sorting:
- π€ The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation toolsβ21,228Updated this week
- Unsupervised text tokenizer for Neural Network-based text generation.β11,668Updated this week
- π€ Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal modelβ¦β157,071Updated this week
- Minimalist ML framework for Rustβ19,455Feb 19, 2026Updated last week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,513Updated this week
- Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)β3,042Jan 13, 2026Updated last month
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β32,170Sep 30, 2025Updated 5 months ago
- State-of-the-Art Text Embeddingsβ18,298Feb 20, 2026Updated last week
- Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.β30,860Feb 21, 2026Updated last week
- A library for efficient similarity search and clustering of dense vectors.β39,195Updated this week
- A very simple framework for state-of-the-art Natural Language Processing (NLP)β14,359Oct 27, 2025Updated 4 months ago
- An open-source NLP research library, built on PyTorch.β11,889Nov 22, 2022Updated 3 years ago
- Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the clβ¦β29,102Updated this week
- Extremely fast Query Engine for DataFrames, written in Rustβ37,513Updated this week
- Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"β6,490Jan 14, 2026Updated last month
- Rust bindings for the C++ api of PyTorch.β5,289Jan 22, 2026Updated last month
- Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and moreβ34,940Updated this week
- Papers & presentation materials from Hugging Face's internal science dayβ2,052Oct 31, 2020Updated 5 years ago
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.β41,648Updated this week
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,296Feb 9, 2026Updated 2 weeks ago
- Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.β14,419Updated this week
- Simple, safe way to store and distribute tensorsβ3,637Updated this week
- Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rustβ14,608Feb 20, 2026Updated last week
- π« Industrial-strength Natural Language Processing (NLP) in Pythonβ33,254Nov 27, 2025Updated 3 months ago
- tiktoken is a fast BPE tokeniser for use with OpenAI's models.β17,384Feb 8, 2026Updated 2 weeks ago
- Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the moβ¦β22,981Jul 28, 2024Updated last year
- Large Language Model Text Generation Inferenceβ10,774Jan 8, 2026Updated last month
- [Unmaintained, see README] An ecosystem of Rust libraries for working with large language modelsβ6,150Jun 24, 2024Updated last year
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β20,678Updated this week
- Rust bindings for the Python interpreterβ15,368Updated this week
- Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.β41,516Updated this week
- TensorFlow code and pre-trained models for BERTβ39,875Jul 23, 2024Updated last year
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesβ22,033Jan 23, 2026Updated last month
- Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languagesβ7,729Updated this week
- ONNX Runtime: cross-platform, high performance ML inferencing and training acceleratorβ19,389Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ71,234Updated this week
- πͺβKnock Knock: Get notified when your training ends with only two additional lines of codeβ2,828Jun 23, 2023Updated 2 years ago
- Accessible large language models via k-bit quantization for PyTorch.β7,997Updated this week
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generatorsβ2,371Mar 23, 2024Updated last year