srush / GPU-PuzzlesLinks

Solve puzzles. Learn CUDA.

☆11,279

Alternatives and similar repositories for GPU-Puzzles

Users that are interested in GPU-Puzzles are comparing it to the libraries listed below

Sorting:

srush / Tensor-Puzzles
Solve puzzles. Improve your pytorch.
☆3,651Updated last year
gpu-mode / lectures
Material for gpu-mode lectures
☆4,752Updated last month
stas00 / ml-engineering
Machine Learning Engineering Open Book
☆14,454Updated this week
adam-maj / tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
☆8,591Updated 11 months ago
minitorch / minitorch
The full minitorch student suite.
☆2,131Updated 11 months ago
srush / Triton-Puzzles
Puzzles for learning Triton
☆1,769Updated 8 months ago
gpu-mode / resource-stream
GPU programming related news and material links
☆1,625Updated 6 months ago
HazyResearch / ThunderKittens
Tile primitives for speedy kernels
☆2,523Updated last week
NVIDIA / warp
A Python framework for accelerated simulation, data generation and spatial computing.
☆5,319Updated this week
KellerJordan / modded-nanogpt
NanoGPT (124M) in 3 minutes
☆2,851Updated last week
aylazai / Stock-Trend-Prediction
This project is a stock trend prediction web application created using Python and Streamlit. The purpose of this web application is to al…
☆10Updated 2 years ago
pytorch / torchtitan
A PyTorch native platform for training generative AI models
☆4,093Updated this week
karpathy / llm.c
LLM training in simple, raw C/CUDA
☆27,176Updated 3 weeks ago
AnswerDotAI / gpu.cpp
A lightweight library for portable low-level GPU computation using WebGPU.
☆3,880Updated last week
karpathy / micrograd
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
☆12,350Updated 11 months ago
karpathy / minbpe
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
☆9,761Updated last year
pytorch-labs / gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
☆6,033Updated 3 months ago
linkedin / Liger-Kernel
Efficient Triton Kernels for LLM Training
☆5,390Updated this week
google-deepmind / penzai
A JAX research toolkit for building, editing, and visualizing neural networks.
☆1,804Updated last month
srush / LLM-Training-Puzzles
What would you do with 1000 H100s...
☆1,064Updated last year
jax-ml / jax
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
☆32,853Updated this week
facebookincubator / AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…
☆4,659Updated 3 months ago
openxla / xla
A machine learning compiler for GPUs, CPUs, and ML accelerators
☆3,370Updated this week
naklecha / llama3-from-scratch
llama3 implementation one matrix multiplication at a time
☆15,050Updated last year
NVIDIA / cuda-python
CUDA Python: Performance meets Productivity
☆2,835Updated this week
arogozhnikov / einops
Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)
☆9,042Updated 3 weeks ago
facebookresearch / lingua
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
☆4,652Updated this week
karpathy / build-nanogpt
Video+code lecture on building nanoGPT from scratch
☆4,228Updated 11 months ago
XuehaiPan / nvitop
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
☆5,742Updated 2 weeks ago
Infatoshi / cuda-course
☆1,289Updated 3 weeks ago