VKCOM / YouTokenToMe
Unsupervised text tokenizer focused on computational efficiency
☆965Updated 10 months ago
Alternatives and similar repositories for YouTokenToMe:
Users that are interested in YouTokenToMe are comparing it to the libraries listed below
- Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)☆1,194Updated 4 months ago
- Fast BPE☆662Updated 8 months ago
- Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.☆1,138Updated last year
- A tool for holistic analysis of language generations systems☆467Updated 2 years ago
- Python port of Moses tokenizer, truecaser and normalizer☆489Updated 8 months ago
- Language-Agnostic SEntence Representations☆3,617Updated 9 months ago
- PyTorch original implementation of Cross-lingual Language Model Pretraining.☆2,900Updated 2 years ago
- Unsupervised Word Segmentation for Neural Machine Translation and Text Generation☆2,218Updated 6 months ago
- FastFormers - highly efficient transformer models for NLU☆704Updated last year
- Models for automatic abstractive summarization☆171Updated 2 years ago
- NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations☆780Updated 9 months ago
- Conditional Transformer Language Model for Controllable Generation☆1,875Updated 3 years ago
- Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons☆1,101Updated last month
- Tools for shrinking fastText models (in gensim format)☆175Updated 9 months ago
- ☆36Updated 2 years ago
- A python tool for evaluating the quality of sentence embeddings.☆2,093Updated 11 months ago
- Calculates Word Mover's Distance Insanely Fast☆460Updated last year
- 🌊HMTL: Hierarchical Multi-Task Learning - A State-of-the-Art neural network model for several NLP tasks based on PyTorch and AllenNLP☆1,192Updated last year
- Simple, fast unsupervised word aligner☆744Updated 2 years ago
- Deep NLP Course☆631Updated 5 years ago
- Fast and customizable text tokenization library with BPE and SentencePiece support☆297Updated 5 months ago
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators☆2,347Updated 10 months ago
- A list of pretrained Transformer models for the Russian language.☆173Updated 5 years ago
- GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors☆498Updated 5 years ago
- XTREME is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models that covers 40 ty…☆638Updated 2 years ago
- Transformer language model (GPT-2) with sentencepiece tokenizer☆163Updated 3 years ago
- ☆494Updated last year
- Evaluating Cross-lingual Sentence Representations☆449Updated 3 years ago
- jiant is an nlp toolkit☆1,661Updated last year
- Fast topic modeling platform☆662Updated last year