huggingface / tokenizersLinks
π₯ Fast State-of-the-Art Tokenizers optimized for Research and Production
β9,927Updated this week
Alternatives and similar repositories for tokenizers
Users that are interested in tokenizers are comparing it to the libraries listed below
Sorting:
- Unsupervised text tokenizer for Neural Network-based text generation.β11,105Updated last week
- Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languagesβ7,526Updated last week
- Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"β6,397Updated 2 months ago
- An open-source NLP research library, built on PyTorch.β11,861Updated 2 years ago
- BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)β7,549Updated last month
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β31,656Updated last month
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β8,951Updated last week
- A very simple framework for state-of-the-art Natural Language Processing (NLP)β14,232Updated last week
- The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic β¦β3,577Updated last month
- Language-Agnostic SEntence Representationsβ3,647Updated last year
- π€ The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation toolsβ20,389Updated last week
- State-of-the-Art Text Embeddingsβ17,187Updated this week
- tiktoken is a fast BPE tokeniser for use with OpenAI's models.β15,146Updated 4 months ago
- Models, data loaders and abstractions for language processing, powered by PyTorchβ3,546Updated this week
- Trax β Deep Learning with Clear Code and Speedβ8,241Updated 3 months ago
- Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.β29,850Updated this week
- π€ Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal modelβ¦β147,239Updated this week
- π« Industrial-strength Natural Language Processing (NLP) in Pythonβ32,010Updated last month
- A framework for training and evaluating AI models on a variety of openly available dialogue datasets.β10,600Updated last year
- Open Source Neural Machine Translation and (Large) Language Models in PyTorchβ6,911Updated 4 months ago
- Simple, safe way to store and distribute tensorsβ3,356Updated 3 weeks ago
- π Scalable embedding, reasoning, ranking for images and sentences with CLIPβ12,703Updated last year
- Papers & presentation materials from Hugging Face's internal science dayβ2,047Updated 4 years ago
- Rust bindings for the C++ api of PyTorch.β4,924Updated 2 months ago
- Ongoing research training transformer models at scaleβ12,960Updated this week
- Large Language Model Text Generation Inferenceβ10,352Updated this week
- Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)β2,905Updated last month
- The implementation of DeBERTaβ2,121Updated last year
- Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conveβ¦β4,200Updated 2 months ago
- π€ Evaluate: A library for easily evaluating machine learning models and datasets.β2,266Updated 2 weeks ago