huggingface / tokenizers
π₯ Fast State-of-the-Art Tokenizers optimized for Research and Production
β9,670Updated last month
Alternatives and similar repositories for tokenizers
Users that are interested in tokenizers are comparing it to the libraries listed below
Sorting:
- Unsupervised text tokenizer for Neural Network-based text generation.β10,878Updated last month
- Ongoing research training transformer models at scaleβ12,358Updated this week
- State-of-the-Art Text Embeddingsβ16,699Updated this week
- Large Language Model Text Generation Inferenceβ10,119Updated this week
- BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)β7,389Updated last year
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β31,435Updated 4 months ago
- An open-source NLP research library, built on PyTorch.β11,847Updated 2 years ago
- Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languagesβ7,461Updated 2 weeks ago
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β8,708Updated this week
- Train transformer language models with reinforcement learning.β13,703Updated this week
- Development repository for the Triton language and compilerβ15,568Updated this week
- Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"β6,349Updated 2 weeks ago
- π€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.β18,371Updated this week
- A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Autoβ¦β14,317Updated this week
- Accessible large language models via k-bit quantization for PyTorch.β7,020Updated this week
- π€ The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation toolsβ20,107Updated this week
- Fast and memory-efficient exact attentionβ17,346Updated last week
- A library for efficient similarity search and clustering of dense vectors.β34,852Updated this week
- A very simple framework for state-of-the-art Natural Language Processing (NLP)β14,167Updated this week
- Papers & presentation materials from Hugging Face's internal science dayβ2,047Updated 4 years ago
- π Scalable embedding, reasoning, ranking for images and sentences with CLIPβ12,665Updated last year
- tiktoken is a fast BPE tokeniser for use with OpenAI's models.β14,461Updated 2 months ago
- PyTorch original implementation of Cross-lingual Language Model Pretraining.β2,906Updated 2 years ago
- Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the moβ¦β22,866Updated 9 months ago
- Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conveβ¦β4,178Updated 3 weeks ago
- Transformer related optimization, including BERT, GPTβ6,152Updated last year
- State of the Art Natural Language Processingβ3,965Updated this week
- Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.β29,473Updated this week
- Models, data loaders and abstractions for language processing, powered by PyTorchβ3,542Updated this week
- Trax β Deep Learning with Clear Code and Speedβ8,204Updated last month