huggingface / tokenizersLinks
π₯ Fast State-of-the-Art Tokenizers optimized for Research and Production
β9,817Updated this week
Alternatives and similar repositories for tokenizers
Users that are interested in tokenizers are comparing it to the libraries listed below
Sorting:
- State-of-the-Art Text Embeddingsβ16,947Updated last week
- Unsupervised text tokenizer for Neural Network-based text generation.β10,994Updated 2 months ago
- XLNet: Generalized Autoregressive Pretraining for Language Understandingβ6,182Updated 2 years ago
- An open-source NLP research library, built on PyTorch.β11,851Updated 2 years ago
- Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"β6,372Updated last month
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β31,543Updated last week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β8,839Updated this week
- A natural language modeling framework based on PyTorchβ6,325Updated 2 years ago
- Open Source Neural Machine Translation and (Large) Language Models in PyTorchβ6,891Updated 3 months ago
- A very simple framework for state-of-the-art Natural Language Processing (NLP)β14,207Updated this week
- Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languagesβ7,488Updated last week
- π€ The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation toolsβ20,270Updated last week
- BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)β7,473Updated 2 weeks ago
- π Scalable embedding, reasoning, ranking for images and sentences with CLIPβ12,686Updated last year
- A framework for training and evaluating AI models on a variety of openly available dialogue datasets.β10,559Updated last year
- Papers & presentation materials from Hugging Face's internal science dayβ2,047Updated 4 years ago
- A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Autoβ¦β14,800Updated this week
- Ongoing research training transformer models at scaleβ12,600Updated this week
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesβ21,420Updated 2 weeks ago
- Trax β Deep Learning with Clear Code and Speedβ8,223Updated 2 months ago
- A library for efficient similarity search and clustering of dense vectors.β35,530Updated last week
- Language-Agnostic SEntence Representationsβ3,644Updated last year
- TensorFlow code and pre-trained models for BERTβ39,226Updated 10 months ago
- Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and moreβ32,533Updated this week
- Large Language Model Text Generation Inferenceβ10,236Updated this week
- Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to diskβ13,808Updated 10 months ago
- π€ Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal modelβ¦β145,689Updated this week
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generatorsβ2,356Updated last year
- Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the moβ¦β22,889Updated 10 months ago
- Models, data loaders and abstractions for language processing, powered by PyTorchβ3,544Updated this week