VKCOM / YouTokenToMe
Unsupervised text tokenizer focused on computational efficiency
☆966Updated last year
Alternatives and similar repositories for YouTokenToMe:
Users that are interested in YouTokenToMe are comparing it to the libraries listed below
- Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)☆1,212Updated 7 months ago
- Fast BPE☆671Updated 10 months ago
- Tools for shrinking fastText models (in gensim format)☆178Updated last year
- Python port of Moses tokenizer, truecaser and normalizer☆495Updated 11 months ago
- FastFormers - highly efficient transformer models for NLU☆705Updated last month
- Calculates Word Mover's Distance Insanely Fast☆462Updated last year
- Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.☆1,144Updated last year
- A tool for holistic analysis of language generations systems☆468Updated 3 years ago
- 🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy☆1,383Updated 3 months ago
- LASER multilingual sentence embeddings as a pip package☆223Updated last year
- ☆322Updated 2 years ago
- ☆505Updated last year
- Fast, general, and tested differentiable structured prediction in PyTorch☆1,110Updated 3 years ago
- This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, an…☆557Updated 3 years ago
- Super easy library for BERT based NLP models☆1,897Updated 8 months ago
- Transformer language model (GPT-2) with sentencepiece tokenizer☆164Updated 4 years ago
- Repository for the paper "Optimal Subarchitecture Extraction for BERT"☆472Updated 2 years ago
- Modern spell checking library - accurate, fast, multi-language☆635Updated 8 months ago
- 💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy☆731Updated 8 months ago
- xfspell — the Transformer Spell Checker☆190Updated 4 years ago
- A list of pretrained Transformer models for the Russian language.☆174Updated 5 years ago
- Byte Pair Encoding for Python!☆228Updated 2 years ago
- MASS: Masked Sequence to Sequence Pre-training for Language Generation☆1,119Updated 2 years ago
- A fast, efficient universal vector embedding utility package.☆1,647Updated last year
- ⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.☆578Updated 2 years ago
- 🌊HMTL: Hierarchical Multi-Task Learning - A State-of-the-Art neural network model for several NLP tasks based on PyTorch and AllenNLP☆1,194Updated last year
- Models for automatic abstractive summarization☆171Updated 2 years ago
- GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors☆506Updated 5 years ago
- Fast topic modeling platform☆669Updated last year
- jiant is an nlp toolkit☆1,666Updated last year