π₯ Fast State-of-the-Art Tokenizers optimized for Research and Production
β10,597Apr 2, 2026Updated last week
Alternatives and similar repositories for tokenizers
Users that are interested in tokenizers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π€ The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation toolsβ21,374Updated this week
- π€ Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal modelβ¦β159,060Updated this week
- Unsupervised text tokenizer for Neural Network-based text generation.β11,745Updated this week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,596Apr 2, 2026Updated last week
- Minimalist ML framework for Rustβ19,884Apr 3, 2026Updated last week
- Bare Metal GPUs on DigitalOcean Gradient AI β’ AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)β3,055Jan 13, 2026Updated 2 months ago
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β32,202Sep 30, 2025Updated 6 months ago
- Papers & presentation materials from Hugging Face's internal science dayβ2,054Oct 31, 2020Updated 5 years ago
- State-of-the-Art Text Embeddingsβ18,494Apr 2, 2026Updated last week
- A very simple framework for state-of-the-art Natural Language Processing (NLP)β14,361Oct 27, 2025Updated 5 months ago
- An open-source NLP research library, built on PyTorch.β11,893Nov 22, 2022Updated 3 years ago
- Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.β30,990Apr 1, 2026Updated last week
- A library for efficient similarity search and clustering of dense vectors.β39,628Updated this week
- Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"β6,502Jan 14, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling on Cloudways β’ AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,348Apr 2, 2026Updated last week
- Rust bindings for the C++ api of PyTorch.β5,333Mar 26, 2026Updated 2 weeks ago
- Simple, safe way to store and distribute tensorsβ3,678Apr 2, 2026Updated last week
- Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the clβ¦β30,085Updated this week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.β41,977Updated this week
- Extremely fast Query Engine for DataFrames, written in Rustβ37,976Updated this week
- Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and moreβ35,311Updated this week
- π« Industrial-strength Natural Language Processing (NLP) in Pythonβ33,425Mar 28, 2026Updated last week
- Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.β14,780Updated this week
- DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- πͺβKnock Knock: Get notified when your training ends with only two additional lines of codeβ2,826Jun 23, 2023Updated 2 years ago
- Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rustβ14,842Updated this week
- Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the moβ¦β22,973Jul 28, 2024Updated last year
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generatorsβ2,373Mar 23, 2024Updated 2 years ago
- tiktoken is a fast BPE tokeniser for use with OpenAI's models.β17,825Mar 27, 2026Updated last week
- TensorFlow code and pre-trained models for BERTβ39,960Jul 23, 2024Updated last year
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β20,895Apr 2, 2026Updated last week
- Large Language Model Text Generation Inferenceβ10,817Mar 21, 2026Updated 2 weeks ago
- Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languagesβ7,766Updated this week
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- [Unmaintained, see README] An ecosystem of Rust libraries for working with large language modelsβ6,151Jun 24, 2024Updated last year
- Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.β41,915Apr 2, 2026Updated last week
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesβ22,077Jan 23, 2026Updated 2 months ago
- Accessible large language models via k-bit quantization for PyTorch.β8,107Updated this week
- Rust bindings for the Python interpreterβ15,541Updated this week
- A Rust machine learning framework.β4,599Mar 18, 2026Updated 3 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ75,637Updated this week