srush / GPU-PuzzlesLinks
Solve puzzles. Learn CUDA.
☆11,279Updated 10 months ago
Alternatives and similar repositories for GPU-Puzzles
Users that are interested in GPU-Puzzles are comparing it to the libraries listed below
Sorting:
- Solve puzzles. Improve your pytorch.☆3,651Updated last year
- Material for gpu-mode lectures☆4,752Updated last month
- Machine Learning Engineering Open Book☆14,454Updated this week
- A minimal GPU design in Verilog to learn how GPUs work from the ground up☆8,591Updated 11 months ago
- The full minitorch student suite.☆2,131Updated 11 months ago
- Puzzles for learning Triton☆1,769Updated 8 months ago
- GPU programming related news and material links☆1,625Updated 6 months ago
- Tile primitives for speedy kernels☆2,523Updated last week
- A Python framework for accelerated simulation, data generation and spatial computing.☆5,319Updated this week
- NanoGPT (124M) in 3 minutes☆2,851Updated last week
- This project is a stock trend prediction web application created using Python and Streamlit. The purpose of this web application is to al…☆10Updated 2 years ago
- A PyTorch native platform for training generative AI models☆4,093Updated this week
- LLM training in simple, raw C/CUDA☆27,176Updated 3 weeks ago
- A lightweight library for portable low-level GPU computation using WebGPU.☆3,880Updated last week
- A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API☆12,350Updated 11 months ago
- Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.☆9,761Updated last year
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆6,033Updated 3 months ago
- Efficient Triton Kernels for LLM Training☆5,390Updated this week
- A JAX research toolkit for building, editing, and visualizing neural networks.☆1,804Updated last month
- What would you do with 1000 H100s...☆1,064Updated last year
- Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more☆32,853Updated this week
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆4,659Updated 3 months ago
- A machine learning compiler for GPUs, CPUs, and ML accelerators☆3,370Updated this week
- llama3 implementation one matrix multiplication at a time☆15,050Updated last year
- CUDA Python: Performance meets Productivity☆2,835Updated this week
- Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)☆9,042Updated 3 weeks ago
- Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.☆4,652Updated this week
- Video+code lecture on building nanoGPT from scratch☆4,228Updated 11 months ago
- An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.☆5,742Updated 2 weeks ago
- ☆1,289Updated 3 weeks ago