BobMcDear / neural-network-cuda
Neural network from scratch in CUDA/C++
☆78Updated 3 months ago
Alternatives and similar repositories for neural-network-cuda:
Users that are interested in neural-network-cuda are comparing it to the libraries listed below
- Implement Neural Networks in Cuda from Scratch☆22Updated 11 months ago
- CUDA Matrix Multiplication Optimization☆181Updated 9 months ago
- Simple neural network implementation using CUDA technology. It is an educational implementation.☆96Updated 7 years ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆179Updated last year
- NVIDIA tools guide☆129Updated 3 months ago
- High-Performance SGEMM on CUDA devices☆90Updated 3 months ago
- LLM training in simple, raw C/CUDA☆92Updated 11 months ago
- PyTorch implementation of the vision transformer☆18Updated 2 years ago
- Reference Kernels for the Leaderboard☆33Updated last week
- ☆31Updated 3 months ago
- Accelerated General (FP32) Matrix Multiplication from scratch in CUDA☆114Updated 3 months ago
- ☆51Updated last week
- The simplest but fast implementation of matrix multiplication in CUDA.☆34Updated 9 months ago
- Training material for Nsight developer tools☆156Updated 8 months ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆130Updated 4 years ago
- Some CUDA example code with READMEs.☆94Updated last month
- Zero to Hero GPU and CUDA for Maths & ML tutorials with examples.☆182Updated last week
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆226Updated 7 months ago
- ☆152Updated 8 months ago
- Fast CUDA matrix multiplication from scratch☆691Updated last year
- ☆200Updated this week
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆165Updated last month
- Step-by-step optimization of CUDA SGEMM☆310Updated 3 years ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆534Updated this week
- Serial and parallel implementations of matrix multiplication☆40Updated 4 years ago
- Cataloging released Triton kernels.☆217Updated 3 months ago
- My C++ deep learning framework & other machine learning algorithms☆87Updated last year
- Fastest kernels written from scratch☆236Updated 3 weeks ago
- Collection of kernels written in Triton language☆120Updated 3 weeks ago
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆348Updated this week