modular / mojo-gpu-puzzlesLinks
Learn GPU Programming in Mojo🔥 by Solving Puzzles
☆288Updated this week
Alternatives and similar repositories for mojo-gpu-puzzles
Users that are interested in mojo-gpu-puzzles are comparing it to the libraries listed below
Sorting:
- Nabla is a novel Distributed-Tensor and Scientific-Computing Framework; built from scratch on top of Mojo and MAX☆318Updated this week
- port of Andrjey Karpathy's llm.c to Mojo☆363Updated 6 months ago
- Competitive GPU kernel optimization platform.☆153Updated this week
- A Machine Learning framework from scratch in Pure Mojo 🔥☆441Updated last year
- Write a fast kernel and run it on Discord. See how you compare against the best!☆68Updated last week
- A Learning Journey: Micrograd in Mojo 🔥☆65Updated last year
- Simple MPI implementation for prototyping or learning☆300Updated 6 months ago
- Quantized LLM training in pure CUDA/C++.☆235Updated 2 weeks ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆155Updated 2 years ago
- Where GPUs get cooked 👩🍳🔥☆363Updated 2 weeks ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆198Updated 8 months ago
- Learning about CUDA by writing PTX code.☆152Updated last year
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs☆843Updated last week
- Tensor library with autograd using only Rust's standard library☆71Updated last year
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆201Updated this week
- SIMD quantization kernels☆94Updated 5 months ago
- A working machine learning framework in pure Mojo 🔥☆130Updated last year
- A curriculum for learning about gpu performance engineering, from scratch to what the frontier AI labs do☆341Updated 3 weeks ago
- ☆29Updated last year
- Implementation of Karpathy's micrograd in Mojo☆77Updated 2 years ago
- PyTorch Single Controller☆957Updated this week
- Solve puzzles to improve your tinygrad skills!☆178Updated 3 months ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆202Updated 2 years ago
- NuMojo is a library for numerical computing in Mojo 🔥 similar to numpy in Python.☆199Updated 2 weeks ago
- GPU documentation for humans☆518Updated last week
- High-Performance FP32 GEMM on CUDA devices☆117Updated last year
- Machine Learning algorithms in pure Mojo 🔥☆63Updated last week
- torchax is a PyTorch frontend for JAX. It gives JAX the ability to author JAX programs using familiar PyTorch syntax. It also provides JA…☆175Updated this week
- Alex Krizhevsky's original code from Google Code☆199Updated 9 years ago
- Visualization of cache-optimized matrix multiplication☆157Updated 10 months ago