modular / mojo-gpu-puzzlesLinks
Learn GPU Programming in Mojo🔥 by Solving Puzzles
☆146Updated last week
Alternatives and similar repositories for mojo-gpu-puzzles
Users that are interested in mojo-gpu-puzzles are comparing it to the libraries listed below
Sorting:
- JAX-like Neural Network Training Library in Python with CPU/GPU Acceleration via Mojo and MAX☆286Updated last month
- A Machine Learning framework from scratch in Pure Mojo 🔥☆440Updated 8 months ago
- port of Andrjey Karpathy's llm.c to Mojo☆357Updated 2 months ago
- ☆28Updated last year
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆144Updated last year
- A Learning Journey: Micrograd in Mojo 🔥☆62Updated 11 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆58Updated 2 weeks ago
- Accurate, Hardware Accelerated, Special Functions in Mojo 🔥☆35Updated 10 months ago
- Simple MPI implementation for prototyping or learning☆284Updated 2 months ago
- NuMojo is a library for numerical computing in Mojo 🔥 similar to numpy in Python.☆185Updated 2 weeks ago
- A working machine learning framework in pure Mojo 🔥☆131Updated last year
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs☆650Updated this week
- Quantized LLM training in pure CUDA/C++.☆180Updated this week
- ☆53Updated 2 months ago
- PyTorch Single Controller☆435Updated this week
- Implementation of Karpathy's micrograd in Mojo☆77Updated last year
- SIMD quantization kernels☆87Updated last month
- Learning about CUDA by writing PTX code.☆138Updated last year
- The Tensor (or Array)☆449Updated last year
- Tutorials on tinygrad☆419Updated 2 weeks ago
- Where GPUs get cooked 👩🍳🔥☆285Updated 3 weeks ago
- Competitive GPU kernel optimization platform.☆106Updated last week
- A fast and compact Dict implementation in Mojo 🔥☆36Updated 2 months ago
- Machine Learning algorithms in pure Mojo 🔥☆45Updated 2 weeks ago
- Solve puzzles to improve your tinygrad skills!☆145Updated 7 months ago
- Tensor library with autograd using only Rust's standard library☆69Updated last year
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆301Updated this week
- PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP☆126Updated last month
- High-Performance SGEMM on CUDA devices☆107Updated 8 months ago
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆97Updated 2 weeks ago