VKCOM/YouTokenToMe

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/VKCOM/YouTokenToMe)

VKCOM / YouTokenToMe

Unsupervised text tokenizer focused on computational efficiency

☆979

Alternatives and similar repositories for YouTokenToMe

Users that are interested in YouTokenToMe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

glample / fastBPE
View on GitHub
Fast BPE
☆677Jun 18, 2024Updated 2 years ago
google / sentencepiece
View on GitHub
Unsupervised text tokenizer for Neural Network-based text generation.
☆11,971Updated this week
vlarine / transformers-ru
View on GitHub
A list of pretrained Transformer models for the Russian language.
☆176Feb 3, 2020Updated 6 years ago
rsennrich / subword-nmt
View on GitHub
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
☆2,271Aug 7, 2024Updated last year
bheinzerling / bpemb
View on GitHub
Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)
☆1,222Oct 1, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
facebookresearch / LASER
View on GitHub
Language-Agnostic SEntence Representations
☆3,661May 2, 2024Updated 2 years ago
facebookresearch / XLM
View on GitHub
PyTorch original implementation of Cross-lingual Language Model Pretraining.
☆2,926Feb 14, 2023Updated 3 years ago
RossiyaSegodnya / ria_news_dataset
View on GitHub
"Rossiya Segodnya" news dataset
☆46Sep 25, 2019Updated 6 years ago
harvardnlp / pytorch-struct
View on GitHub
Fast, general, and tested differentiable structured prediction in PyTorch
☆1,132Apr 20, 2022Updated 4 years ago
facebookresearch / MUSE
View on GitHub
A library for Multilingual Unsupervised or Supervised word Embeddings
☆3,248Aug 31, 2022Updated 3 years ago
natasha / razdel
View on GitHub
Rule-based token, sentence segmentation for Russian language
☆286Apr 13, 2026Updated 3 months ago
bigartm / bigartm
View on GitHub
Fast topic modeling platform
☆675Feb 5, 2026Updated 5 months ago
huggingface / tokenizers
View on GitHub
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
☆10,898Updated this week
belskikh / kekas
View on GitHub
Just another DL library
☆183Mar 9, 2021Updated 5 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
flairNLP / flair
View on GitHub
A very simple framework for state-of-the-art Natural Language Processing (NLP)
☆14,384Oct 27, 2025Updated 8 months ago
guillaume-be / SentencePiece-Rust-example
View on GitHub
Supporting example for "A Rust SentencePiece implementation"
☆20Jun 7, 2020Updated 6 years ago
catalyst-team / catalyst
View on GitHub
Accelerated deep learning R&D
☆3,376Jul 8, 2026Updated last week
hplt-project / sacremoses
View on GitHub
Python port of Moses tokenizer, truecaser and normalizer
☆497Feb 6, 2026Updated 5 months ago
nyu-mll / jiant
View on GitHub
jiant is an nlp toolkit
☆1,675Jul 6, 2023Updated 3 years ago
neulab / compare-mt
View on GitHub
A tool for holistic analysis of language generations systems
☆471Sep 22, 2025Updated 9 months ago
vlarine / ruberta
View on GitHub
Russian RoBERTa
☆31Nov 29, 2019Updated 6 years ago
facebookresearch / fairseq
View on GitHub
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
☆32,245Sep 30, 2025Updated 9 months ago
huggingface / naacl_transfer_learning_tutorial
View on GitHub
Repository of code for the tutorial on Transfer Learning in NLP held at NAACL 2019 in Minneapolis, MN, USA
☆723Oct 16, 2019Updated 6 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
appvision-ai / fast-bert
View on GitHub
Super easy library for BERT based NLP models
☆1,918Aug 19, 2024Updated last year
snakers4 / open_stt
View on GitHub
Open STT
☆826Mar 11, 2022Updated 4 years ago
plasticityai / magnitude
View on GitHub
A fast, efficient universal vector embedding utility package.
☆1,665Aug 3, 2023Updated 2 years ago
facebookresearch / adaptive-span
View on GitHub
Transformer training code for sequential tasks
☆610Sep 14, 2021Updated 4 years ago
ppleskov / Russian-Language-Model
View on GitHub
☆56May 12, 2018Updated 8 years ago
facebookresearch / SentEval
View on GitHub
A python tool for evaluating the quality of sentence embeddings.
☆2,110Mar 19, 2024Updated 2 years ago
facebookresearch / SentAugment
View on GitHub
SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in c…
☆359Feb 22, 2022Updated 4 years ago
MyLtYkRiTiK / dl_in_nlp_2019
View on GitHub
Taking together Stanford cs224n course with support of iPavlov team.
☆97Mar 28, 2019Updated 7 years ago
microsoft / fastformers
View on GitHub
FastFormers - highly efficient transformer models for NLU
☆706Mar 21, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
marcotcr / checklist
View on GitHub
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
☆2,051Jan 9, 2024Updated 2 years ago
feedly / transfer-nlp
View on GitHub
NLP library designed for reproducible experimentation management
☆294Jul 25, 2024Updated last year
VProv / BPE-Dropout
View on GitHub
An official implementation of "BPE-Dropout: Simple and Effective Subword Regularization" algorithm.
☆54Feb 17, 2021Updated 5 years ago
facebookresearch / unlikelihood_training
View on GitHub
Neural Text Generation with Unlikelihood Training
☆311Aug 31, 2021Updated 4 years ago
natasha / slovnet
View on GitHub
Deep Learning based NLP modeling for Russian language
☆248Jul 24, 2023Updated 2 years ago
deeppavlov / DeepPavlov
View on GitHub
An open source library for deep learning end-to-end dialog systems and chatbots.
☆6,990Aug 6, 2025Updated 11 months ago
allenai / tpu_pretrain
View on GitHub
LM Pretraining with PyTorch/TPU
☆137Oct 24, 2019Updated 6 years ago