huggingface / tokenizers
π₯ Fast State-of-the-Art Tokenizers optimized for Research and Production
β9,364Updated this week
Alternatives and similar repositories for tokenizers:
Users that are interested in tokenizers are comparing it to the libraries listed below
- Unsupervised text tokenizer for Neural Network-based text generation.β10,580Updated this week
- An open-source NLP research library, built on PyTorch.β11,798Updated 2 years ago
- State-of-the-Art Text Embeddingsβ15,951Updated this week
- BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)β7,120Updated last year
- Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"β6,262Updated 4 months ago
- A very simple framework for state-of-the-art Natural Language Processing (NLP)β14,051Updated last week
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β30,954Updated last month
- Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languagesβ7,366Updated this week
- π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iβ¦β8,306Updated this week
- Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.β28,965Updated last week
- Trax β Deep Learning with Clear Code and Speedβ8,159Updated last week
- Ongoing research training transformer models at scaleβ11,318Updated this week
- Development repository for the Triton language and compilerβ14,360Updated this week
- XLNet: Generalized Autoregressive Pretraining for Language Understandingβ6,183Updated last year
- Papers & presentation materials from Hugging Face's internal science dayβ2,046Updated 4 years ago
- π€ The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation toolsβ19,605Updated this week
- Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conveβ¦β4,142Updated 8 months ago
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesβ20,715Updated last week
- Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and moreβ31,236Updated this week
- A library for efficient similarity search and clustering of dense vectors.β32,944Updated this week
- Language-Agnostic SEntence Representationsβ3,615Updated 9 months ago
- Open Source Neural Machine Translation and (Large) Language Models in PyTorchβ6,817Updated last month
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representationsβ3,263Updated last year
- A framework for training and evaluating AI models on a variety of openly available dialogue datasets.β10,498Updated last year
- PyTorch extensions for high performance and large scale training.β3,254Updated last month
- A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) trainingβ21,310Updated 5 months ago
- PyTorch original implementation of Cross-lingual Language Model Pretraining.β2,900Updated 2 years ago
- π« Industrial-strength Natural Language Processing (NLP) in Pythonβ30,894Updated last week
- The implementation of DeBERTaβ2,035Updated last year
- Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to diskβ13,479Updated 6 months ago