andylolu2 / cuda-mnistLinks
Training MLP on MNIST in 1.5 seconds with pure CUDA
☆46Updated last year
Alternatives and similar repositories for cuda-mnist
Users that are interested in cuda-mnist are comparing it to the libraries listed below
Sorting:
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆192Updated 2 years ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆147Updated 2 years ago
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆248Updated last year
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆305Updated last week
- Learn CUDA with PyTorch☆100Updated last month
- ☆176Updated last year
- High-Performance SGEMM on CUDA devices☆109Updated 9 months ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆367Updated 6 months ago
- The simplest but fast implementation of matrix multiplication in CUDA.☆39Updated last year
- Documented and Unit Tested educational Deep Learning framework with Autograd from scratch.☆122Updated last year
- Quantized LLM training in pure CUDA/C++.☆214Updated this week
- Custom kernels in Triton language for accelerating LLMs☆26Updated last year
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores☆330Updated 10 months ago
- Learning about CUDA by writing PTX code.☆146Updated last year
- Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)☆160Updated last year
- LLM training in simple, raw C/CUDA☆107Updated last year
- A really tiny autograd engine☆97Updated 5 months ago
- Simple MPI implementation for prototyping or learning☆287Updated 3 months ago
- Parallel framework for training and fine-tuning deep neural networks☆65Updated 2 weeks ago
- Neural network from scratch in CUDA/C++☆87Updated 2 months ago
- Fastest kernels written from scratch☆386Updated last month
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆195Updated 5 months ago
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆116Updated last week
- Implementation of Flash Attention in Jax☆220Updated last year
- ☆337Updated this week
- NVIDIA tools guide☆145Updated 10 months ago
- An open-source efficient deep learning framework/compiler, written in python.☆733Updated 2 months ago
- Notebooks for the "Deep Learning with JAX" book☆158Updated 5 months ago
- Cataloging released Triton kernels.☆264Updated 2 months ago
- Fast low-bit matmul kernels in Triton☆392Updated 2 weeks ago