huggingface / tokenizersLinks
π₯ Fast State-of-the-Art Tokenizers optimized for Research and Production
β10,119Updated last week
Alternatives and similar repositories for tokenizers
Users that are interested in tokenizers are comparing it to the libraries listed below
Sorting:
- Unsupervised text tokenizer for Neural Network-based text generation.β11,345Updated last week
- A very simple framework for state-of-the-art Natural Language Processing (NLP)β14,300Updated last month
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β31,861Updated 2 weeks ago
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β9,199Updated this week
- π€ The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation toolsβ20,721Updated this week
- State-of-the-Art Text Embeddingsβ17,685Updated this week
- An open-source NLP research library, built on PyTorch.β11,879Updated 2 years ago
- Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"β6,435Updated 5 months ago
- Ongoing research training transformer models at scaleβ13,824Updated this week
- π€ Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal modelβ¦β150,927Updated this week
- BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)β7,698Updated 4 months ago
- The implementation of DeBERTaβ2,158Updated 2 years ago
- Papers & presentation materials from Hugging Face's internal science dayβ2,050Updated 4 years ago
- XLNet: Generalized Autoregressive Pretraining for Language Understandingβ6,180Updated 2 years ago
- π Accelerate inference and training of π€ Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimizationβ¦β3,115Updated this week
- Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conveβ¦β4,216Updated last month
- Language-Agnostic SEntence Representationsβ3,647Updated last year
- Serve, optimize and scale PyTorch models in productionβ4,347Updated 2 months ago
- Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languagesβ7,623Updated last week
- π Scalable embedding, reasoning, ranking for images and sentences with CLIPβ12,753Updated last year
- Trax β Deep Learning with Clear Code and Speedβ8,287Updated 2 weeks ago
- Large Language Model Text Generation Inferenceβ10,566Updated 3 weeks ago
- Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)β2,959Updated 4 months ago
- Open Source Neural Machine Translation and (Large) Language Models in PyTorchβ6,951Updated 7 months ago
- Accessible large language models via k-bit quantization for PyTorch.β7,647Updated last week
- Fast and memory-efficient exact attentionβ19,864Updated last week
- Train transformer language models with reinforcement learning.β15,818Updated this week
- Data augmentation for NLPβ4,620Updated last year
- A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Autoβ¦β15,838Updated this week
- PyTorch original implementation of Cross-lingual Language Model Pretraining.β2,922Updated 2 years ago