tgautam03 / CUDA-CLinks
Simple problems implemented in CUDA C
☆33Updated 9 months ago
Alternatives and similar repositories for CUDA-C
Users that are interested in CUDA-C are comparing it to the libraries listed below
Sorting:
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆202Updated 2 years ago
- Neural network from scratch in CUDA/C++☆88Updated 4 months ago
- ☆178Updated last year
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆257Updated last year
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆155Updated 2 years ago
- Learn CUDA with PyTorch☆185Updated last week
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆197Updated 8 months ago
- Tutorials for Triton, a language for writing gpu kernels☆72Updated 2 years ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆249Updated 8 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆68Updated last week
- ☆89Updated 2 months ago
- High-Performance FP32 GEMM on CUDA devices☆117Updated last year
- Step by step implementation of a fast softmax kernel in CUDA☆60Updated last year
- ☆234Updated last year
- Learning about CUDA by writing PTX code.☆151Updated last year
- Cataloging released Triton kernels.☆291Updated 4 months ago
- Competitive GPU kernel optimization platform.☆149Updated last week
- Custom kernels in Triton language for accelerating LLMs☆27Updated last year
- Experiment of using Tangent to autodiff triton☆82Updated 2 years ago
- Accelerated General (FP32) Matrix Multiplication from scratch in CUDA☆181Updated last year
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆46Updated last year
- Collection of kernels written in Triton language☆175Updated 9 months ago
- General Matrix Multiplication using NVIDIA Tensor Cores☆28Updated last year
- CUDA Matrix Multiplication Optimization☆256Updated last year
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆457Updated 10 months ago
- ☆277Updated this week
- Slides, notes, and materials for the workshop☆339Updated last year
- This repository contains the experimental PyTorch native float8 training UX☆227Updated last year
- Nvidia contributed CUDA tutorial for Numba☆265Updated 3 years ago
- making the official triton tutorials actually comprehensible☆104Updated 5 months ago