VKCOM / YouTokenToMe
Unsupervised text tokenizer focused on computational efficiency
☆953Updated 5 months ago
Related projects: ⓘ
- Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)☆1,177Updated 6 months ago
- Fast BPE☆652Updated 3 months ago
- PyTorch original implementation of Cross-lingual Language Model Pretraining.☆2,872Updated last year
- jiant is an nlp toolkit☆1,637Updated last year
- Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.☆1,128Updated 7 months ago
- Language-Agnostic SEntence Representations☆3,576Updated 4 months ago
- Super easy library for BERT based NLP models☆1,853Updated last month
- Modern spell checking library - accurate, fast, multi-language☆605Updated 3 weeks ago
- A python tool for evaluating the quality of sentence embeddings.☆2,081Updated 6 months ago
- 🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy☆1,339Updated 3 months ago
- FastFormers - highly efficient transformer models for NLU☆702Updated 8 months ago
- 💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy☆724Updated last month
- 🌊HMTL: Hierarchical Multi-Task Learning - A State-of-the-Art neural network model for several NLP tasks based on PyTorch and AllenNLP☆1,189Updated last year
- This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, an…☆545Updated 2 years ago
- Fast, general, and tested differentiable structured prediction in PyTorch☆1,105Updated 2 years ago
- Basic Utilities for PyTorch Natural Language Processing (NLP)☆2,209Updated last year
- MASS: Masked Sequence to Sequence Pre-training for Language Generation☆1,116Updated last year
- A framework to learn cross-lingual word embedding mappings☆642Updated last year
- The Schema-Guided Dialogue Dataset☆539Updated last year
- GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors☆482Updated 4 years ago
- Unsupervised Word Segmentation for Neural Machine Translation and Text Generation☆2,181Updated last month
- A fast, efficient universal vector embedding utility package.☆1,623Updated last year
- Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons☆1,040Updated last month
- Python port of Moses tokenizer, truecaser and normalizer☆486Updated 3 months ago
- Calculates Word Mover's Distance Insanely Fast☆461Updated last year
- Simple, fast unsupervised word aligner☆732Updated 2 years ago
- NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations☆770Updated 4 months ago
- XTREME is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models that covers 40 ty…☆629Updated last year
- Repository of code for the tutorial on Transfer Learning in NLP held at NAACL 2019 in Minneapolis, MN, USA☆718Updated 4 years ago
- Neural machine translation and sequence learning using TensorFlow☆1,450Updated 11 months ago