VKCOM / YouTokenToMeLinks
Unsupervised text tokenizer focused on computational efficiency
☆972Updated last year
Alternatives and similar repositories for YouTokenToMe
Users that are interested in YouTokenToMe are comparing it to the libraries listed below
Sorting:
- Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)☆1,215Updated 9 months ago
- Fast topic modeling platform☆668Updated last month
- A list of pretrained Transformer models for the Russian language.☆174Updated 5 years ago
- Fast BPE☆670Updated last year
- Modern spell checking library - accurate, fast, multi-language☆641Updated 10 months ago
- Tools for shrinking fastText models (in gensim format)☆178Updated last year
- ☆36Updated 2 years ago
- Language-Agnostic SEntence Representations☆3,647Updated last year
- FastFormers - highly efficient transformer models for NLU☆705Updated 3 months ago
- Python port of Moses tokenizer, truecaser and normalizer☆495Updated last year
- ☆55Updated 7 years ago
- Models for automatic abstractive summarization☆171Updated 3 years ago
- Morphological analyzer for Russian and English languages based on neural networks and dictionary-lookup systems.☆154Updated last year
- Web-ify your word2vec: framework to serve distributional semantic models online☆200Updated 4 months ago
- ☆83Updated 2 years ago
- NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations☆786Updated last year
- Deep NLP Course☆629Updated 5 years ago
- Taking together Stanford cs224n course with support of iPavlov team.☆97Updated 6 years ago
- jiant is an nlp toolkit☆1,670Updated 2 years ago
- Rule-based token, sentence segmentation for Russian language☆267Updated last year
- Fast, general, and tested differentiable structured prediction in PyTorch☆1,114Updated 3 years ago
- LASER multilingual sentence embeddings as a pip package☆224Updated last year
- UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files☆383Updated 7 months ago
- ☆514Updated last year
- GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors☆509Updated 5 years ago
- Byte Pair Encoding for Python!☆229Updated 2 years ago
- 🌊HMTL: Hierarchical Multi-Task Learning - A State-of-the-Art neural network model for several NLP tasks based on PyTorch and AllenNLP☆1,196Updated last year
- XTREME is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models that covers 40 ty…☆645Updated 2 years ago
- A tool for holistic analysis of language generations systems☆469Updated 3 years ago
- Links to Russian corpora + Python functions for loading and parsing☆299Updated last year