Simple Byte pair Encoding mechanism used for tokenization process . written purely in C
☆147Nov 11, 2024Updated last year
Alternatives and similar repositories for bpe.c
Users that are interested in bpe.c are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- UNet diffusion model in pure CUDA☆656Jun 28, 2024Updated last year
- Fast bare-bones BPE for modern tokenizer training☆176Jun 23, 2025Updated 9 months ago
- Implementation of Diffusion Transformer (DiT) in JAX☆308Jun 11, 2024Updated last year
- Teardown of Google Glass☆39Jan 11, 2014Updated 12 years ago
- RuLES: a benchmark for evaluating rule-following in language models☆249Feb 24, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- A basic implementation of convolutional neural nets☆59Apr 20, 2014Updated 11 years ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆598Aug 12, 2025Updated 7 months ago
- Simple MPI implementation for prototyping or learning☆305Aug 6, 2025Updated 7 months ago
- ScriptBots is an Open Source Evolutionary Artificial Life Simulation of Predator-Prey dynamics, written by Andrej Karpathy.☆63Feb 18, 2011Updated 15 years ago
- A lightweight library for portable low-level GPU computation using WebGPU.☆3,954Oct 8, 2025Updated 5 months ago
- My favorite C programming practices.☆2,151Jan 19, 2026Updated 2 months ago
- Benchmark testbed for assessing the performance of optimisation algorithms☆86Jan 7, 2015Updated 11 years ago
- Official repository for our work on micro-budget training of large-scale diffusion models.☆1,555Jan 12, 2025Updated last year
- Minimalistic 4D-parallelism distributed training framework for education purpose☆2,119Aug 26, 2025Updated 7 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Implementing DeepSeek R1's GRPO algorithm from scratch☆1,795Apr 18, 2025Updated 11 months ago
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆337Nov 2, 2025Updated 4 months ago
- A python script to help manage a Gmail inbox by filtering out promotional emails using GPT-3 or GPT-4.☆458Dec 2, 2023Updated 2 years ago
- Tile primitives for speedy kernels☆3,244Mar 17, 2026Updated last week
- ScriptBots is an Open Source Evolutionary Artificial Life Simulation of Predator-Prey dynamics, written by Andrej Karpathy.☆164Jan 2, 2012Updated 14 years ago
- JavaScript with Batteries Included for Google Glass☆218Jul 10, 2016Updated 9 years ago
- gpt-2 from scratch in mlx☆418Jun 12, 2024Updated last year
- Game making library for using Canvas element☆95Oct 17, 2023Updated 2 years ago
- Implements SFO minibatch optimizer in Python and MATLAB, and reproduces figures from paper.☆134May 17, 2021Updated 4 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- C++ TensorRT Implementation of NanoSAM☆51Dec 28, 2023Updated 2 years ago
- [ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling☆954Nov 16, 2025Updated 4 months ago
- NanoGPT (124M) in 2 minutes☆5,003Mar 17, 2026Updated last week
- Schedule-Free Optimization in PyTorch☆2,265May 21, 2025Updated 10 months ago
- ☆16Feb 25, 2026Updated last month
- Google Mirror API's Quickstart for Python☆350Jun 13, 2021Updated 4 years ago
- Custom triton kernels for training Karpathy's nanoGPT.☆19Oct 21, 2024Updated last year
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,625Sep 10, 2025Updated 6 months ago
- A pitch detection model trained to be robust against noise and reverberation environments.☆27Jan 21, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A PyTorch native platform for training generative AI models☆5,162Mar 20, 2026Updated last week
- [UNMAINTAINED] Histogram of Oriented Gradients (HOG) descriptor extractor☆172Mar 1, 2015Updated 11 years ago
- The simplest, fastest repository for training/finetuning small-sized VLMs.☆4,738Oct 27, 2025Updated 5 months ago
- Implementation for MatMul-free LM.☆3,059Dec 2, 2025Updated 3 months ago
- This repo is text to speech with learnable audio encoder without alignment with transcript reference☆54Sep 20, 2025Updated 6 months ago
- The working draft to split rocket core out from rocket chip☆14Dec 22, 2023Updated 2 years ago
- Minimal reproduction of DeepSeek R1-Zero☆12,963Feb 27, 2026Updated last month