VKCOM / YouTokenToMe
Unsupervised text tokenizer focused on computational efficiency
☆959Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for YouTokenToMe
- Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)☆1,187Updated last month
- Fast BPE☆656Updated 5 months ago
- Fast, general, and tested differentiable structured prediction in PyTorch☆1,108Updated 2 years ago
- Super easy library for BERT based NLP models☆1,866Updated 3 months ago
- Language-Agnostic SEntence Representations☆3,600Updated 6 months ago
- PyTorch original implementation of Cross-lingual Language Model Pretraining.☆2,892Updated last year
- Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.☆1,132Updated 9 months ago
- jiant is an nlp toolkit☆1,647Updated last year
- A tool for holistic analysis of language generations systems☆467Updated 2 years ago
- Modern spell checking library - accurate, fast, multi-language☆613Updated 2 months ago
- ☆487Updated 9 months ago
- A python tool for evaluating the quality of sentence embeddings.☆2,087Updated 8 months ago
- Minimalist NMT for educational purposes☆678Updated 9 months ago
- 💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy☆725Updated 3 months ago
- NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations☆777Updated 6 months ago
- A list of pretrained Transformer models for the Russian language.☆174Updated 4 years ago
- Python port of Moses tokenizer, truecaser and normalizer☆488Updated 5 months ago
- The website for the CMU Language Technologies Institute low resource NLP bootcamp 2020☆598Updated 4 years ago
- Tools for shrinking fastText models (in gensim format)☆173Updated 6 months ago
- LASER multilingual sentence embeddings as a pip package☆225Updated last year
- Transformer language model (GPT-2) with sentencepiece tokenizer☆164Updated 3 years ago
- A Visual Analysis Tool to Explore Learned Representations in Transformers Models☆584Updated 9 months ago
- 🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy☆1,352Updated 5 months ago
- ☆320Updated last year
- FastFormers - highly efficient transformer models for NLU☆701Updated 10 months ago
- Evaluating Cross-lingual Sentence Representations☆442Updated 3 years ago
- Fast Neural Machine Translation in C++☆1,254Updated last year
- ⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.☆566Updated last year
- An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)☆443Updated 3 weeks ago
- Unsupervised Word Segmentation for Neural Machine Translation and Text Generation☆2,197Updated 3 months ago