π₯ Fast State-of-the-Art Tokenizers optimized for Research and Production
β10,850Jun 26, 2026Updated this week
Alternatives and similar repositories for tokenizers
Users that are interested in tokenizers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- π€ The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation toolsβ21,648Jun 18, 2026Updated last week
- π€ Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal modelβ¦β161,885Updated this week
- Unsupervised text tokenizer for Neural Network-based text generation.β11,925Updated this week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,737Jun 22, 2026Updated last week
- Minimalist ML framework for Rustβ20,562Updated this week
- End-to-end encrypted email - Proton Mail β’ AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)β3,067Jan 13, 2026Updated 5 months ago
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β32,233Sep 30, 2025Updated 9 months ago
- Papers & presentation materials from Hugging Face's internal science dayβ2,053Oct 31, 2020Updated 5 years ago
- State-of-the-Art Embeddings, Retrieval, and Rerankingβ18,853Updated this week
- A very simple framework for state-of-the-art Natural Language Processing (NLP)β14,379Oct 27, 2025Updated 8 months ago
- An open-source NLP research library, built on PyTorch.β11,890Nov 22, 2022Updated 3 years ago
- Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.β31,209Jun 10, 2026Updated 2 weeks ago
- A library for efficient similarity search and clustering of dense vectors.β40,378Jun 21, 2026Updated last week
- Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"β6,528Jan 14, 2026Updated 5 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI β’ AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,426Jun 22, 2026Updated last week
- Simple, safe way to store and distribute tensorsβ3,790Jun 19, 2026Updated last week
- Rust bindings for the C++ api of PyTorch.β5,433May 17, 2026Updated last month
- Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the clβ¦β32,575Jun 23, 2026Updated last week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.β42,586Updated this week
- Extremely fast Query Engine for DataFrames, written in Rustβ38,879Updated this week
- Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and moreβ35,930Updated this week
- π« Industrial-strength Natural Language Processing (NLP) in Pythonβ33,697May 19, 2026Updated last month
- Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.β15,487Updated this week
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- πͺβKnock Knock: Get notified when your training ends with only two additional lines of codeβ2,826Jun 23, 2023Updated 3 years ago
- Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rustβ15,473Updated this week
- Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the moβ¦β22,957Jul 28, 2024Updated last year
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generatorsβ2,367Mar 23, 2024Updated 2 years ago
- tiktoken is a fast BPE tokeniser for use with OpenAI's models.β18,577May 24, 2026Updated last month
- TensorFlow code and pre-trained models for BERTβ40,049Jul 23, 2024Updated last year
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β21,337Updated this week
- Large Language Model Text Generation Inferenceβ10,862Mar 21, 2026Updated 3 months ago
- Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languagesβ7,815Jun 23, 2026Updated last week
- Serverless GPU API endpoints on Runpod - Get Bonus Credits β’ AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [Unmaintained, see README] An ecosystem of Rust libraries for working with large language modelsβ6,153Jun 24, 2024Updated 2 years ago
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesβ22,152Jan 23, 2026Updated 5 months ago
- Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.β43,025Updated this week
- Accessible large language models via k-bit quantization for PyTorch.β8,286Jun 22, 2026Updated last week
- Rust bindings for the Python interpreterβ15,835Jun 22, 2026Updated last week
- A Rust machine learning framework.β4,687May 30, 2026Updated last month
- A high-throughput and memory-efficient inference and serving engine for LLMsβ83,677Updated this week