andylolu2 / cuda-mnistLinks
Training MLP on MNIST in 1.5 seconds with pure CUDA
☆46Updated 9 months ago
Alternatives and similar repositories for cuda-mnist
Users that are interested in cuda-mnist are comparing it to the libraries listed below
Sorting:
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆189Updated last year
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆137Updated last year
- ☆162Updated last year
- Documented and Unit Tested educational Deep Learning framework with Autograd from scratch.☆120Updated last year
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆238Updated 11 months ago
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆286Updated 2 weeks ago
- High-Performance SGEMM on CUDA devices☆97Updated 6 months ago
- Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)☆152Updated last year
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆351Updated 3 months ago
- LLM training in simple, raw C/CUDA☆103Updated last year
- The simplest but fast implementation of matrix multiplication in CUDA.☆37Updated last year
- An open-source efficient deep learning framework/compiler, written in python.☆715Updated 3 weeks ago
- Slides, notes, and materials for the workshop☆329Updated last year
- ☆48Updated 7 months ago
- Learning about CUDA by writing PTX code.☆133Updated last year
- Notebooks for the "Deep Learning with JAX" book☆152Updated 2 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆568Updated this week
- Neural network from scratch in CUDA/C++☆83Updated 7 months ago
- Puzzles for exploring transformers☆355Updated 2 years ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆190Updated 2 months ago
- A parallel framework for training deep neural networks☆63Updated 5 months ago
- A really tiny autograd engine☆95Updated 2 months ago
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆73Updated last week
- Qualcomm Cloud AI SDK (Platform and Apps) enable high performance deep learning inference on Qualcomm Cloud AI platforms delivering high …☆64Updated last week
- Context Manager to profile the forward and backward times of PyTorch's nn.Module☆83Updated last year
- NVIDIA curated collection of educational resources related to general purpose GPU programming.☆616Updated last week
- ☆228Updated this week
- NVIDIA tools guide☆144Updated 7 months ago
- Fastest kernels written from scratch☆311Updated 4 months ago
- CUDA Learning guide☆423Updated last year