huggingface / tokenizersLinks

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

☆9,927

Alternatives and similar repositories for tokenizers

Users that are interested in tokenizers are comparing it to the libraries listed below

Sorting:

google / sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
☆11,105Updated last week
stanfordnlp / stanza
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
☆7,526Updated last week
google-research / text-to-text-transfer-transformer
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
☆6,397Updated 2 months ago
allenai / allennlp
An open-source NLP research library, built on PyTorch.
☆11,861Updated 2 years ago
jessevig / bertviz
BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)
☆7,549Updated last month
facebookresearch / fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
☆31,656Updated last month
huggingface / accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (i…
☆8,951Updated last week
flairNLP / flair
A very simple framework for state-of-the-art Natural Language Processing (NLP)
☆14,232Updated last week
PAIR-code / lit
The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic …
☆3,577Updated last month
facebookresearch / LASER
Language-Agnostic SEntence Representations
☆3,647Updated last year
huggingface / datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
☆20,389Updated last week
UKPLab / sentence-transformers
State-of-the-Art Text Embeddings
☆17,187Updated this week
openai / tiktoken
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
☆15,146Updated 4 months ago
pytorch / text
Models, data loaders and abstractions for language processing, powered by PyTorch
☆3,546Updated this week
google / trax
Trax — Deep Learning with Clear Code and Speed
☆8,241Updated 3 months ago
Lightning-AI / pytorch-lightning
Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
☆29,850Updated this week
huggingface / transformers
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal model…
☆147,239Updated this week
explosion / spaCy
💫 Industrial-strength Natural Language Processing (NLP) in Python
☆32,010Updated last month
facebookresearch / ParlAI
A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
☆10,600Updated last year
OpenNMT / OpenNMT-py
Open Source Neural Machine Translation and (Large) Language Models in PyTorch
☆6,911Updated 4 months ago
huggingface / safetensors
Simple, safe way to store and distribute tensors
☆3,356Updated 3 weeks ago
jina-ai / clip-as-service
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
☆12,703Updated last year
huggingface / awesome-papers
Papers & presentation materials from Hugging Face's internal science day
☆2,047Updated 4 years ago
LaurentMazare / tch-rs
Rust bindings for the C++ api of PyTorch.
☆4,924Updated 2 months ago
NVIDIA / Megatron-LM
Ongoing research training transformer models at scale
☆12,960Updated this week
huggingface / text-generation-inference
Large Language Model Text Generation Inference
☆10,352Updated this week
guillaume-be / rust-bert
Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)
☆2,905Updated last month
microsoft / DeBERTa
The implementation of DeBERTa
☆2,121Updated last year
ThilinaRajapakse / simpletransformers
Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conve…
☆4,200Updated 2 months ago
huggingface / evaluate
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
☆2,266Updated 2 weeks ago