google / sentencepieceLinks
Unsupervised text tokenizer for Neural Network-based text generation.
ā11,246Updated last week
Alternatives and similar repositories for sentencepiece
Users that are interested in sentencepiece are comparing it to the libraries listed below
Sorting:
- An open-source NLP research library, built on PyTorch.ā11,878Updated 2 years ago
- š„ Fast State-of-the-Art Tokenizers optimized for Research and Productionā10,060Updated 2 weeks ago
- BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)ā7,628Updated 3 months ago
- Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"ā6,421Updated 4 months ago
- Open Source Neural Machine Translation and (Large) Language Models in PyTorchā6,935Updated 6 months ago
- State-of-the-Art Text Embeddingsā17,513Updated 2 weeks ago
- XLNet: Generalized Autoregressive Pretraining for Language Understandingā6,180Updated 2 years ago
- Ongoing research training transformer models at scaleā13,541Updated this week
- š A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iā¦ā9,133Updated this week
- Models, data loaders and abstractions for language processing, powered by PyTorchā3,559Updated this week
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.ā31,777Updated this week
- š¤ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.ā19,570Updated this week
- Fast and memory-efficient exact attentionā19,471Updated this week
- Unsupervised Word Segmentation for Neural Machine Translation and Text Generationā2,251Updated last year
- Accessible large language models via k-bit quantization for PyTorch.ā7,567Updated this week
- Open standard for machine learning interoperabilityā19,582Updated last week
- Code and model for the paper "Improving Language Understanding by Generative Pre-Training"ā2,238Updated 6 years ago
- KenLM: Faster and Smaller Language Model Queriesā2,668Updated 5 months ago
- Data augmentation for NLPā4,612Updated last year
- Train transformer language models with reinforcement learning.ā15,520Updated this week
- ā2,880Updated last week
- A library for efficient similarity search and clustering of dense vectors.ā37,076Updated this week
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesā21,713Updated 2 months ago
- The implementation of DeBERTaā2,146Updated last year
- Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.ā14,592Updated last month
- A very simple framework for state-of-the-art Natural Language Processing (NLP)ā14,284Updated 3 weeks ago
- tiktoken is a fast BPE tokeniser for use with OpenAI's models.ā15,911Updated 2 weeks ago
- Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddingsā7,091Updated last month
- Transformer related optimization, including BERT, GPTā6,300Updated last year
- A tool for extracting plain text from Wikipedia dumpsā3,911Updated last year