dshah3 / GPU-PuzzlesLinks

Solve puzzles. Learn CUDA.

☆64

Alternatives and similar repositories for GPU-Puzzles

Users that are interested in GPU-Puzzles are comparing it to the libraries listed below

Sorting:

gpu-mode / profiling-cuda-in-torch
☆162Updated last year
cloneofsimo / min-fsdp
☆82Updated last year
google-deepmind / nanodo
☆274Updated last year
vdesai2014 / inference-optimization-blog-post
☆88Updated last year
srush / Transformer-Puzzles
Puzzles for exploring transformers
☆355Updated 2 years ago
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆84Updated 7 months ago
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆130Updated last year
MatX-inc / seqax
seqax = sequence modeling + JAX
☆165Updated last week
ayaka14732 / llama-2-jax
JAX implementation of the Llama 2 model
☆219Updated last year
joey00072 / Tinytorch
A really tiny autograd engine
☆95Updated 2 months ago
srush / GPTWorld
A puzzle to learn about prompting
☆132Updated 2 years ago
siboehm / ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
☆137Updated last year
yixiaoer / tpux
A set of Python scripts that makes your experience on TPU better
☆55Updated last year
imbue-ai / carbs
Cost aware hyperparameter tuning algorithm
☆166Updated last year
dvruette / barrel-rec-pytorch
☆53Updated last year
srush / Autodiff-Puzzles
☆443Updated 9 months ago
xjdr-alt / simple_transformer
Simple Transformer in Jax
☆138Updated last year
HomebrewML / HeavyBall
Efficient optimizers
☆252Updated last week
apple / ml-sigma-reparam
☆304Updated last year
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆79Updated last year
Jaykef / Triton-nanoGPT
Custom triton kernels for training Karpathy's nanoGPT.
☆19Updated 9 months ago
jax-ml / jax-llm-examples
☆137Updated last week
young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆35Updated last week
xrsrke / pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
☆86Updated last year
srush / annotated-mamba
Annotated version of the Mamba paper
☆487Updated last year
young-geng / scalax
A simple library for scaling up JAX programs
☆140Updated 9 months ago
cloneofsimo / scaling-guide
WIP
☆93Updated 11 months ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆149Updated last month
gautierdag / bpeasy
Fast bare-bones BPE for modern tokenizer training
☆160Updated last month
linjames0 / Transformer-CUDA
An implementation of the transformer architecture onto an Nvidia CUDA kernel
☆189Updated last year