π₯ Fast State-of-the-Art Tokenizers optimized for Research and Production
β10,529Feb 28, 2026Updated 3 weeks ago
Alternatives and similar repositories for tokenizers
Users that are interested in tokenizers are comparing it to the libraries listed below
Sorting:
- π€ The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation toolsβ21,289Updated this week
- π€ Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal modelβ¦β158,060Updated this week
- Unsupervised text tokenizer for Neural Network-based text generation.β11,700Mar 1, 2026Updated 2 weeks ago
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,563Updated this week
- Minimalist ML framework for Rustβ19,669Mar 14, 2026Updated last week
- Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)β3,054Jan 13, 2026Updated 2 months ago
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β32,190Sep 30, 2025Updated 5 months ago
- Papers & presentation materials from Hugging Face's internal science dayβ2,054Oct 31, 2020Updated 5 years ago
- State-of-the-Art Text Embeddingsβ18,390Mar 12, 2026Updated last week
- A very simple framework for state-of-the-art Natural Language Processing (NLP)β14,354Oct 27, 2025Updated 4 months ago
- An open-source NLP research library, built on PyTorch.β11,893Nov 22, 2022Updated 3 years ago
- Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.β30,926Mar 10, 2026Updated last week
- A library for efficient similarity search and clustering of dense vectors.β39,403Updated this week
- Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"β6,494Jan 14, 2026Updated 2 months ago
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,325Mar 13, 2026Updated last week
- Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the clβ¦β29,611Updated this week
- Rust bindings for the C++ api of PyTorch.β5,320Jan 22, 2026Updated last month
- Simple, safe way to store and distribute tensorsβ3,660Mar 12, 2026Updated last week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.β41,807Mar 13, 2026Updated last week
- Extremely fast Query Engine for DataFrames, written in Rustβ37,750Mar 13, 2026Updated last week
- Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and moreβ35,108Updated this week
- π« Industrial-strength Natural Language Processing (NLP) in Pythonβ33,352Updated this week
- Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.β14,602Updated this week
- πͺβKnock Knock: Get notified when your training ends with only two additional lines of codeβ2,825Jun 23, 2023Updated 2 years ago
- Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rustβ14,740Mar 13, 2026Updated last week
- Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the moβ¦β22,975Jul 28, 2024Updated last year
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generatorsβ2,371Mar 23, 2024Updated last year
- tiktoken is a fast BPE tokeniser for use with OpenAI's models.β17,599Feb 8, 2026Updated last month
- TensorFlow code and pre-trained models for BERTβ39,917Jul 23, 2024Updated last year
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β20,809Updated this week
- Large Language Model Text Generation Inferenceβ10,803Jan 8, 2026Updated 2 months ago
- Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languagesβ7,740Updated this week
- [Unmaintained, see README] An ecosystem of Rust libraries for working with large language modelsβ6,152Jun 24, 2024Updated last year
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesβ22,046Jan 23, 2026Updated last month
- Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.β41,799Updated this week
- Accessible large language models via k-bit quantization for PyTorch.β8,052Updated this week
- Rust bindings for the Python interpreterβ15,462Updated this week
- A Rust machine learning framework.β4,580Mar 9, 2026Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ73,479Updated this week