Fast bare-bones BPE for modern tokenizer training
☆176Jun 23, 2025Updated 9 months ago
Alternatives and similar repositories for bpeasy
Users that are interested in bpeasy are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Simple Byte pair Encoding mechanism used for tokenization process . written purely in C☆148Nov 11, 2024Updated last year
- JAX implementation ViT-VQGAN☆63Jul 23, 2022Updated 3 years ago
- Google+ Blog☆15Oct 9, 2011Updated 14 years ago
- UNet diffusion model in pure CUDA☆656Jun 28, 2024Updated last year
- RuLES: a benchmark for evaluating rule-following in language models☆249Feb 24, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- A benchmark to evaluate language models on questions I've previously asked them to solve.☆1,049Apr 27, 2025Updated 11 months ago
- ScriptBots is an Open Source Evolutionary Artificial Life Simulation of Predator-Prey dynamics, written by Andrej Karpathy.☆63Feb 18, 2011Updated 15 years ago
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one☆26Aug 5, 2024Updated last year
- Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.☆10,399Jul 1, 2024Updated last year
- ####### ALERT! #########: my fork of the project has moved:☆17Dec 23, 2016Updated 9 years ago
- Teardown of Google Glass☆39Jan 11, 2014Updated 12 years ago
- ☆19Sep 16, 2025Updated 6 months ago
- Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"☆22Feb 14, 2024Updated 2 years ago
- A basic implementation of convolutional neural nets☆59Apr 20, 2014Updated 11 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆598Aug 12, 2025Updated 7 months ago
- Visualize multi-model embedding spaces. The first goal is to quickly get a lay of the land of any embedding space. Then be able to scroll…☆27May 16, 2024Updated last year
- useful scripts to work with Twitter + Python. Requires the tweepy library.☆86Nov 29, 2012Updated 13 years ago
- Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch☆29Mar 22, 2026Updated last week
- 0-Shot Tokenizer Transplant☆14May 16, 2025Updated 10 months ago
- BPE modification that implements removing of the intermediate tokens during tokenizer training.☆27Nov 25, 2024Updated last year
- Extracts plain text, language identification and more metadata from WARC records☆23Oct 1, 2025Updated 5 months ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆28Apr 17, 2024Updated last year
- Code for Zero-Shot Tokenizer Transfer☆143Jan 14, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Schedule-Free Optimization in PyTorch☆2,265May 21, 2025Updated 10 months ago
- ☆16Apr 4, 2022Updated 3 years ago
- [ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling☆956Nov 16, 2025Updated 4 months ago
- Google Mirror API's Quickstart for Python☆350Jun 13, 2021Updated 4 years ago
- Supervoice diffusion enhance☆28Jul 15, 2024Updated last year
- ☆10Oct 2, 2024Updated last year
- Experimental CUDA kernel framework unifying typed dimensions, NVRTC JIT specialization, and ML‑guided tuning.☆46Feb 9, 2026Updated last month
- Fine-tune mistral-7B on 3090s, a100s, h100s☆724Oct 11, 2023Updated 2 years ago
- Measuring if attention is explanation with ROAR☆22Mar 3, 2023Updated 3 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆75Aug 2, 2024Updated last year
- Ruby Gem that makes sure that only a single instance of a code block is running.☆16Mar 13, 2013Updated 13 years ago
- Implementation of Diffusion Transformer (DiT) in JAX☆309Jun 11, 2024Updated last year
- Code for co-training large language models (e.g. T0) with smaller ones (e.g. BERT) to boost few-shot performance☆17Sep 23, 2022Updated 3 years ago
- Cramming the training of a (BERT-type) language model into limited compute.☆1,360Jun 13, 2024Updated last year
- Using fourier interpolation to merge large language models☆11Jan 6, 2026Updated 2 months ago
- mReasoner is a unified computational implementation of the model theory of thinking and reasoning☆13Aug 17, 2023Updated 2 years ago