BobMcDear / neural-network-cuda
Neural network from scratch in CUDA/C++
☆78Updated last month
Alternatives and similar repositories for neural-network-cuda:
Users that are interested in neural-network-cuda are comparing it to the libraries listed below
- Implement Neural Networks in Cuda from Scratch☆22Updated 9 months ago
- CUDA Matrix Multiplication Optimization☆169Updated 7 months ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆173Updated last year
- LLM training in simple, raw C/CUDA☆92Updated 10 months ago
- PyTorch implementation of the vision transformer☆18Updated last year
- Cataloging released Triton kernels.☆185Updated 2 months ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆313Updated this week
- ☆136Updated 7 months ago
- Fast CUDA matrix multiplication from scratch☆659Updated last year
- Simple neural network implementation using CUDA technology. It is an educational implementation.☆96Updated 6 years ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆125Updated last year
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆65Updated 4 years ago
- PyTorch implementation of EfficientNet☆10Updated 2 years ago
- ☆188Updated 3 weeks ago
- ☆148Updated last year
- Some CUDA example code with READMEs.☆79Updated last week
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆215Updated 6 months ago
- NVIDIA tools guide☆112Updated 2 months ago
- Awesome resources for GPUs☆551Updated last year
- ☆28Updated 2 months ago
- Fastest kernels written from scratch☆188Updated last week
- Class of High Performance Computing taken at U.T.P 2017☆48Updated 7 years ago
- Step-by-step optimization of CUDA SGEMM☆293Updated 2 years ago
- High-Performance SGEMM on CUDA devices☆86Updated last month
- A parallel framework for training deep neural networks☆56Updated this week
- Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)☆145Updated 9 months ago
- CUDA Learning guide☆339Updated 8 months ago
- Training MLP on MNIST in 1.5 seconds with pure CUDA☆45Updated 4 months ago
- Customized matrix multiplication kernels☆53Updated 3 years ago
- Collection of kernels written in Triton language☆110Updated 3 weeks ago