srush / Tensor-Puzzles
Solve puzzles. Improve your pytorch.
☆3,259Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for Tensor-Puzzles
- Solve puzzles. Learn CUDA.☆9,851Updated 2 months ago
- What would you do with 1000 H100s...☆892Updated 9 months ago
- Puzzles for learning Triton☆1,060Updated last month
- GPU programming related news and material links☆1,205Updated last month
- A JAX research toolkit for building, editing, and visualizing neural networks.☆1,673Updated last week
- Material for gpu-mode lectures☆2,962Updated this week
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆5,647Updated 3 weeks ago
- An autoregressive character-level language model for making more things☆2,582Updated 5 months ago
- ☆386Updated 3 weeks ago
- Schedule-Free Optimization in PyTorch☆1,864Updated this week
- Tensors, for human consumption☆1,111Updated last week
- The full minitorch student suite.☆1,910Updated 2 months ago
- A native PyTorch Library for large model training☆2,566Updated this week
- Machine Learning Engineering Open Book☆11,589Updated this week
- NanoGPT (124M) quality in 8.2 minutes☆911Updated this week
- Language model alignment-focused deep learning curriculum☆1,263Updated 2 months ago
- Tile primitives for speedy kernels☆1,629Updated this week
- Puzzles for exploring transformers☆321Updated last year
- UNet diffusion model in pure CUDA☆567Updated 4 months ago
- nanoGPT style version of Llama 3.1☆1,229Updated 3 months ago
- A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API☆10,434Updated 3 months ago
- llama3 implementation one matrix multiplication at a time☆13,673Updated 5 months ago
- Cramming the training of a (BERT-type) language model into limited compute.☆1,294Updated 4 months ago
- Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors a…☆1,187Updated this week
- Video+code lecture on building nanoGPT from scratch☆3,573Updated 2 months ago
- A minimal GPU design in Verilog to learn how GPUs work from the ground up☆7,066Updated 2 months ago
- Pure Python from-scratch zero-dependency implementation of Bitcoin for educational purposes☆1,606Updated 3 years ago
- LLM papers I'm reading, mostly on inference and model compression☆692Updated 10 months ago
- Simple, minimal implementation of the Mamba SSM in one file of PyTorch.☆2,614Updated 8 months ago
- A deep dive into embeddings starting from fundamentals☆953Updated 6 months ago