modular / mojo-gpu-puzzlesLinks
Learn GPU Programming in Mojo🔥 by Solving Puzzles
☆266Updated 2 weeks ago
Alternatives and similar repositories for mojo-gpu-puzzles
Users that are interested in mojo-gpu-puzzles are comparing it to the libraries listed below
Sorting:
- Machine Learning library for the emerging Mojo/Python ecosystem☆303Updated this week
- port of Andrjey Karpathy's llm.c to Mojo☆362Updated 5 months ago
- A Machine Learning framework from scratch in Pure Mojo 🔥☆441Updated 11 months ago
- Competitive GPU kernel optimization platform.☆144Updated this week
- Quantized LLM training in pure CUDA/C++.☆230Updated this week
- Tensor library with autograd using only Rust's standard library☆71Updated last year
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆154Updated 2 years ago
- SIMD quantization kernels☆93Updated 4 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆66Updated 2 weeks ago
- Simple MPI implementation for prototyping or learning☆297Updated 5 months ago
- A Learning Journey: Micrograd in Mojo 🔥☆65Updated last year
- Learning about CUDA by writing PTX code.☆151Updated last year
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆195Updated 7 months ago
- Solve puzzles to improve your tinygrad skills!☆174Updated 2 months ago
- PyTorch Single Controller☆939Updated this week
- ☆29Updated last year
- Where GPUs get cooked 👩🍳🔥☆345Updated 3 months ago
- Implementation of Karpathy's micrograd in Mojo☆78Updated 2 years ago
- A working machine learning framework in pure Mojo 🔥☆131Updated last year
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs☆801Updated this week
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆182Updated 2 weeks ago
- Fast and Furious AMD Kernels☆331Updated last week
- Tutorials on tinygrad☆448Updated 2 months ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆201Updated 2 years ago
- (WIP) A small but powerful, homemade PyTorch from scratch.☆664Updated last week
- High-Performance SGEMM on CUDA devices☆114Updated 11 months ago
- Visualization of cache-optimized matrix multiplication☆157Updated 9 months ago
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆141Updated 3 months ago
- A zero-to-one guide on scaling modern transformers with n-dimensional parallelism.☆112Updated last week
- Complete solutions to the Programming Massively Parallel Processors Edition 4☆619Updated 6 months ago