srush / GPU-PuzzlesLinks
Solve puzzles. Learn CUDA.
☆11,759Updated last year
Alternatives and similar repositories for GPU-Puzzles
Users that are interested in GPU-Puzzles are comparing it to the libraries listed below
Sorting:
- Solve puzzles. Improve your pytorch.☆3,814Updated last year
- The full minitorch student suite.☆2,224Updated last year
- Material for gpu-mode lectures☆5,355Updated last week
- Machine Learning Engineering Open Book☆15,880Updated last week
- Puzzles for learning Triton☆2,143Updated last year
- GPU programming related news and material links☆1,803Updated 2 months ago
- NanoGPT (124M) in 3 minutes☆3,911Updated last week
- llama3 implementation one matrix multiplication at a time☆15,191Updated last year
- ☆2,207Updated last month
- A machine learning compiler for GPUs, CPUs, and ML accelerators☆3,714Updated last week
- Tile primitives for speedy kernels☆2,955Updated this week
- A minimal GPU design in Verilog to learn how GPUs work from the ground up☆8,937Updated last year
- What would you do with 1000 H100s...☆1,132Updated last year
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆6,162Updated 3 months ago
- A PyTorch native platform for training generative AI models☆4,778Updated this week
- Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.☆4,735Updated 4 months ago
- This project is a stock trend prediction web application created using Python and Streamlit. The purpose of this web application is to al…☆10Updated 2 years ago
- Explanation to key concepts in ML☆8,126Updated 5 months ago
- An ML Systems Onboarding list☆944Updated 10 months ago
- Minimalistic 4D-parallelism distributed training framework for education purpose☆1,901Updated 3 months ago
- Development repository for the Triton language and compiler☆17,730Updated this week
- This repo contains the Hugging Face Deep Reinforcement Learning Course.☆4,629Updated 2 months ago
- CUDA Templates and Python DSLs for High-Performance Linear Algebra☆8,865Updated this week
- LLM training in simple, raw C/CUDA☆28,257Updated 5 months ago
- CUDA Python: Performance meets Productivity☆3,053Updated last week
- PyTorch native quantization and sparsity for training and inference☆2,543Updated this week
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆4,695Updated last month
- PyTorch native post-training library☆5,604Updated last week
- A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API☆13,900Updated last year
- Efficient Triton Kernels for LLM Training☆5,892Updated this week