google / sentencepieceLinks
Unsupervised text tokenizer for Neural Network-based text generation.
ā11,490Updated this week
Alternatives and similar repositories for sentencepiece
Users that are interested in sentencepiece are comparing it to the libraries listed below
Sorting:
- Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"ā6,457Updated last month
- š„ Fast State-of-the-Art Tokenizers optimized for Research and Productionā10,279Updated last week
- Ongoing research training transformer models at scaleā14,493Updated this week
- An open-source NLP research library, built on PyTorch.ā11,887Updated 3 years ago
- State-of-the-Art Text Embeddingsā17,985Updated this week
- Open Source Neural Machine Translation and (Large) Language Models in PyTorchā6,977Updated last month
- Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languagesā7,681Updated this week
- A very simple framework for state-of-the-art Natural Language Processing (NLP)ā14,333Updated last month
- š A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (iā¦ā9,348Updated last week
- BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)ā7,814Updated 6 months ago
- Unsupervised Word Segmentation for Neural Machine Translation and Text Generationā2,259Updated last year
- Accessible large language models via k-bit quantization for PyTorch.ā7,801Updated last week
- A library for efficient similarity search and clustering of dense vectors.ā38,301Updated last week
- XLNet: Generalized Autoregressive Pretraining for Language Understandingā6,181Updated 2 years ago
- Code and model for the paper "Improving Language Understanding by Generative Pre-Training"ā2,262Updated 6 years ago
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.ā32,020Updated 2 months ago
- PyTorch original implementation of Cross-lingual Language Model Pretraining.ā2,921Updated 2 years ago
- Data augmentation for NLPā4,635Updated last year
- Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalitiesā21,866Updated 5 months ago
- Google AI 2018 BERT pytorch implementationā6,507Updated 2 years ago
- Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddingsā7,141Updated 4 months ago
- Train transformer language models with reinforcement learning.ā16,552Updated this week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.ā40,961Updated this week
- tiktoken is a fast BPE tokeniser for use with OpenAI's models.ā16,717Updated 2 months ago
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representationsā3,274Updated 2 years ago
- A framework for training and evaluating AI models on a variety of openly available dialogue datasets.ā10,628Updated 2 years ago
- ā2,915Updated 3 weeks ago
- The implementation of DeBERTaā2,177Updated 2 years ago
- Fast and memory-efficient exact attentionā20,904Updated last week
- š¤ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.ā20,215Updated last week