BobMcDear / neural-network-cudaLinks
Neural network from scratch in CUDA/C++
☆85Updated 7 months ago
Alternatives and similar repositories for neural-network-cuda
Users that are interested in neural-network-cuda are comparing it to the libraries listed below
Sorting:
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆239Updated 11 months ago
- LLM training in simple, raw C/CUDA☆104Updated last year
- Simple neural network implementation using CUDA technology. It is an educational implementation.☆97Updated 7 years ago
- CUDA Matrix Multiplication Optimization☆218Updated last year
- NVIDIA tools guide☆144Updated 7 months ago
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆189Updated last year
- High-Performance SGEMM on CUDA devices☆99Updated 7 months ago
- Implement Neural Networks in Cuda from Scratch☆24Updated last year
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆351Updated this week
- A set of hands-on tutorials for CUDA programming☆236Updated last year
- Some CUDA example code with READMEs.☆170Updated 6 months ago
- Learning about CUDA by writing PTX code.☆135Updated last year
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆180Updated this week
- The simplest but fast implementation of matrix multiplication in CUDA.☆38Updated last year
- CUDA Learning guide☆428Updated last year
- ☆76Updated 3 weeks ago
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆74Updated last week
- ☆163Updated last year
- ☆49Updated 7 months ago
- Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)☆156Updated last year
- ☆180Updated last year
- Learn CUDA with PyTorch☆67Updated last week
- Training MLP on MNIST in 1.5 seconds with pure CUDA☆46Updated 10 months ago
- Training material for Nsight developer tools☆163Updated last year
- Multi-Threaded FP32 Matrix Multiplication on x86 CPUs☆351Updated 4 months ago
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆72Updated 4 years ago
- Super fast FP32 matrix multiplication on RDNA3☆71Updated 5 months ago
- CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. …☆436Updated 2 years ago
- Implementation of Flash Attention in Jax☆216Updated last year
- Memory Optimizations for Deep Learning (ICML 2023)☆105Updated last year