shivance / minbpe.c

a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.

☆21

Related projects: ⓘ

joey00072 / ohara
Collection of autoregressive model implementation
☆62Updated 2 weeks ago
siboehm / ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
☆88Updated 11 months ago
UmerHA / triton_util
Make triton easier
☆39Updated 3 months ago
AnswerDotAI / cold-compress
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…
☆73Updated last month
okarthikb / state-space-models
☆27Updated 2 months ago
cloneofsimo / min-fsdp
☆68Updated 2 months ago
joey00072 / microjax
Jax like function transformation engine but micro, microjax
☆24Updated 3 weeks ago
SeunghyunSEO / optimized_hf_llama_class_for_training
☆42Updated 3 weeks ago
mobiusml / gemlite
Simple and fast low-bit matmul kernels in CUDA
☆48Updated this week
rasbt / pytorch-memory-optim
This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog po…
☆84Updated last year
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆94Updated 2 weeks ago
drisspg / transformer_nuggets
A place to store reusable transformer components of my own creation or found on the interwebs
☆43Updated 3 weeks ago
stas00 / ml-ways
ML/DL Math and Method notes
☆56Updated 9 months ago
RobertCsordas / moeut
☆50Updated last month
cuda-mode / profiling-cuda-in-torch
☆124Updated 7 months ago
gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆79Updated 4 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆66Updated 7 months ago
srush / Tensor-Puzzles-Penzai
☆17Updated 4 months ago
kjslag / spacebyte
A byte-level decoder architecture that matches the performance of tokenized Transformers.
☆57Updated 4 months ago
tanaymeh / mamba-train
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆46Updated 5 months ago
Aleph-Alpha / trigrams
☆29Updated 3 weeks ago
ScalingIntelligence / large_language_monkeys
☆29Updated 2 weeks ago
mag- / gpu_benchmark
Gpu benchmark
☆35Updated 2 weeks ago
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆82Updated 3 weeks ago
wdlctc / mini-s
☆26Updated this week
KhawajaAbaid / micrograd_c
Andrej Kapathy's micrograd implemented in c
☆26Updated last month
lessw2020 / triton_kernels_for_fun_and_profit
Custom kernels in Triton language for accelerating LLMs
☆14Updated 5 months ago
cuda-mode / ring-attention
ring-attention experiments
☆89Updated 5 months ago
melisa-writer / short-transformers
Prune transformer layers
☆60Updated 3 months ago
euclaise / supertrainer2000
☆48Updated 6 months ago