tgautam03 / CUDA-CLinks
Simple problems implemented in CUDA C
☆33Updated 9 months ago
Alternatives and similar repositories for CUDA-C
Users that are interested in CUDA-C are comparing it to the libraries listed below
Sorting:
- Neural network from scratch in CUDA/C++☆88Updated 4 months ago
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆257Updated last year
- Tutorials for Triton, a language for writing gpu kernels☆65Updated 2 years ago
- ☆178Updated last year
- Accelerated General (FP32) Matrix Multiplication from scratch in CUDA☆175Updated last year
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆202Updated 2 years ago
- High-Performance SGEMM on CUDA devices☆115Updated 11 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆114Updated last year
- ☆88Updated 2 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆66Updated 3 weeks ago
- An Online Deep Learning Interface for HPC programs on NVIDIA GPUs☆177Updated this week
- Nvidia contributed CUDA tutorial for Numba☆265Updated 3 years ago
- Experiment of using Tangent to autodiff triton☆81Updated last year
- Slides, notes, and materials for the workshop☆337Updated last year
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆154Updated 2 years ago
- Parallel framework for training and fine-tuning deep neural networks☆69Updated 2 months ago
- Learning about CUDA by writing PTX code.☆151Updated last year
- Custom kernels in Triton language for accelerating LLMs☆27Updated last year
- Step by step implementation of a fast softmax kernel in CUDA☆59Updated last year
- CUDA Matrix Multiplication Optimization☆250Updated last year
- ☆233Updated last year
- This repository contains the experimental PyTorch native float8 training UX☆227Updated last year
- Training MLP on MNIST in 1.5 seconds with pure CUDA☆46Updated last year
- Cataloging released Triton kernels.☆282Updated 4 months ago
- Fastest kernels written from scratch☆517Updated 3 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆195Updated 7 months ago
- ☆208Updated last year
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆246Updated 8 months ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆17Updated last month
- General Matrix Multiplication using NVIDIA Tensor Cores☆27Updated 11 months ago