VKCOM / YouTokenToMe
Unsupervised text tokenizer focused on computational efficiency
☆966Updated last year
Alternatives and similar repositories for YouTokenToMe:
Users that are interested in YouTokenToMe are comparing it to the libraries listed below
- Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)☆1,205Updated 5 months ago
- Fast BPE☆668Updated 9 months ago
- Language-Agnostic SEntence Representations☆3,629Updated 10 months ago
- jiant is an nlp toolkit☆1,664Updated last year
- NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations☆781Updated 10 months ago
- A python tool for evaluating the quality of sentence embeddings.☆2,101Updated last year
- PyTorch original implementation of Cross-lingual Language Model Pretraining.☆2,904Updated 2 years ago
- ☆501Updated last year
- FastFormers - highly efficient transformer models for NLU☆704Updated last week
- XTREME is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models that covers 40 ty…☆642Updated 2 years ago
- A framework to learn cross-lingual word embedding mappings☆647Updated last year
- A tool for holistic analysis of language generations systems☆467Updated 3 years ago
- Simple, fast unsupervised word aligner☆750Updated 2 years ago
- Unsupervised Word Segmentation for Neural Machine Translation and Text Generation☆2,228Updated 7 months ago
- Super easy library for BERT based NLP models☆1,890Updated 7 months ago
- Fast topic modeling platform☆668Updated last year
- Tools for shrinking fastText models (in gensim format)☆178Updated 10 months ago
- MASS: Masked Sequence to Sequence Pre-training for Language Generation☆1,117Updated 2 years ago
- General purpose unsupervised sentence representations☆1,202Updated 2 years ago
- LASER multilingual sentence embeddings as a pip package☆224Updated last year
- 💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy☆733Updated 7 months ago
- Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons☆1,114Updated 2 weeks ago
- The website for the CMU Language Technologies Institute low resource NLP bootcamp 2020☆601Updated 4 years ago
- Fast, general, and tested differentiable structured prediction in PyTorch☆1,112Updated 2 years ago
- Calculates Word Mover's Distance Insanely Fast☆461Updated last year
- Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.☆1,141Updated last year
- 🌊HMTL: Hierarchical Multi-Task Learning - A State-of-the-Art neural network model for several NLP tasks based on PyTorch and AllenNLP☆1,193Updated last year
- Repository of code for the tutorial on Transfer Learning in NLP held at NAACL 2019 in Minneapolis, MN, USA☆722Updated 5 years ago
- BERT-NER (nert-bert) with google bert https://github.com/google-research.☆407Updated 5 years ago
- NeuSpell: A Neural Spelling Correction Toolkit☆691Updated last year