Unsupervised text tokenizer focused on computational efficiency
☆977Mar 29, 2024Updated 2 years ago
Alternatives and similar repositories for YouTokenToMe
Users that are interested in YouTokenToMe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fast BPE☆678Jun 18, 2024Updated last year
- Unsupervised text tokenizer for Neural Network-based text generation.☆11,745Updated this week
- A list of pretrained Transformer models for the Russian language.☆176Feb 3, 2020Updated 6 years ago
- Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE)☆1,223Oct 1, 2024Updated last year
- Unsupervised Word Segmentation for Neural Machine Translation and Text Generation☆2,267Aug 7, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- PyTorch original implementation of Cross-lingual Language Model Pretraining.☆2,928Feb 14, 2023Updated 3 years ago
- Fast, general, and tested differentiable structured prediction in PyTorch☆1,125Apr 20, 2022Updated 3 years ago
- Language-Agnostic SEntence Representations☆3,662May 2, 2024Updated last year
- "Rossiya Segodnya" news dataset☆46Sep 25, 2019Updated 6 years ago
- A library for Multilingual Unsupervised or Supervised word Embeddings☆3,244Aug 31, 2022Updated 3 years ago
- Fast topic modeling platform☆673Feb 5, 2026Updated 2 months ago
- Rule-based token, sentence segmentation for Russian language☆281Jul 24, 2023Updated 2 years ago
- 💥 Fast State-of-the-Art Tokenizers optimized for Research and Production☆10,597Apr 2, 2026Updated last week
- Just another DL library☆183Mar 9, 2021Updated 5 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A very simple framework for state-of-the-art Natural Language Processing (NLP)☆14,361Oct 27, 2025Updated 5 months ago
- Python port of Moses tokenizer, truecaser and normalizer☆494Feb 6, 2026Updated 2 months ago
- Supporting example for "A Rust SentencePiece implementation"☆20Jun 7, 2020Updated 5 years ago
- Accelerated deep learning R&D☆3,375Jun 27, 2025Updated 9 months ago
- jiant is an nlp toolkit☆1,675Jul 6, 2023Updated 2 years ago
- A tool for holistic analysis of language generations systems☆471Sep 22, 2025Updated 6 months ago
- Repository of code for the tutorial on Transfer Learning in NLP held at NAACL 2019 in Minneapolis, MN, USA☆722Oct 16, 2019Updated 6 years ago
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.☆32,202Sep 30, 2025Updated 6 months ago
- A fast, efficient universal vector embedding utility package.☆1,657Aug 3, 2023Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Super easy library for BERT based NLP models☆1,920Aug 19, 2024Updated last year
- Open STT☆820Mar 11, 2022Updated 4 years ago
- Transformer training code for sequential tasks☆609Sep 14, 2021Updated 4 years ago
- A python tool for evaluating the quality of sentence embeddings.