tgautam03 / CUDA-CLinks
Simple problems implemented in CUDA C
☆31Updated 8 months ago
Alternatives and similar repositories for CUDA-C
Users that are interested in CUDA-C are comparing it to the libraries listed below
Sorting:
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆257Updated last year
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆196Updated 2 years ago
- Neural network from scratch in CUDA/C++☆87Updated 3 months ago
- High-Performance SGEMM on CUDA devices☆113Updated 10 months ago
- ☆177Updated last year
- Step by step implementation of a fast softmax kernel in CUDA☆59Updated 11 months ago
- Accelerated General (FP32) Matrix Multiplication from scratch in CUDA☆174Updated 11 months ago
- ☆86Updated last month
- Experiment of using Tangent to autodiff triton☆81Updated last year
- Memory Optimizations for Deep Learning (ICML 2023)☆112Updated last year
- Notes on quantization in neural networks☆113Updated 2 years ago
- General Matrix Multiplication using NVIDIA Tensor Cores☆27Updated 10 months ago
- This is a port of Mistral-7B model in JAX☆32Updated last year
- NVIDIA tools guide☆150Updated 11 months ago
- Collection of kernels written in Triton language☆173Updated 8 months ago
- ☆227Updated 11 months ago
- Competitive GPU kernel optimization platform.☆142Updated last week
- Learning about CUDA by writing PTX code.☆150Updated last year
- Custom kernels in Triton language for accelerating LLMs☆27Updated last year
- ☆203Updated last year
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆153Updated 2 years ago
- Tutorials for Triton, a language for writing gpu kernels☆61Updated 2 years ago
- This repository contains the experimental PyTorch native float8 training UX☆227Updated last year
- ☆262Updated this week
- LLM training in simple, raw C/CUDA☆108Updated last year
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆195Updated 6 months ago
- Parallel framework for training and fine-tuning deep neural networks☆70Updated last month
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆244Updated 7 months ago
- CUDA Matrix Multiplication Optimization☆245Updated last year
- Training MLP on MNIST in 1.5 seconds with pure CUDA☆46Updated last year