tgautam03 / CUDA-CLinks
Simple problems implemented in CUDA C
☆20Updated 2 months ago
Alternatives and similar repositories for CUDA-C
Users that are interested in CUDA-C are comparing it to the libraries listed below
Sorting:
- Accelerated General (FP32) Matrix Multiplication from scratch in CUDA☆118Updated 5 months ago
- Apply GPU in ML and DL☆52Updated 4 months ago
- ☆174Updated 5 months ago
- ☆59Updated this week
- Learn CUDA with PyTorch☆27Updated this week
- Tutorials for Triton, a language for writing gpu kernels☆24Updated last year
- ☆40Updated 5 months ago
- ML/DL Math and Method notes☆61Updated last year
- Learning about CUDA by writing PTX code.☆132Updated last year
- This is a port of Mistral-7B model in JAX☆32Updated 11 months ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆17Updated last week
- Some CUDA example code with READMEs.☆168Updated 3 months ago
- Custom kernels in Triton language for accelerating LLMs☆22Updated last year
- ☆39Updated last month
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆185Updated 3 weeks ago
- A parallel framework for training deep neural networks☆61Updated 3 months ago
- Notes and code for Programming Massively Parallel Processors☆12Updated 2 months ago
- Recreating PyTorch from scratch (C/C++, CUDA, NCCL and Python, with multi-GPU support and automatic differentiation!)☆150Updated last year
- ☆159Updated last year
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆189Updated last month
- This material contains content on how to profile and optimize simple Pytorch mnist code using NVIDIA Nsight Systems and Pytorch Profiler☆13Updated 2 years ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆134Updated last year
- making the official triton tutorials actually comprehensible☆41Updated 3 months ago
- ☆41Updated last month
- Write a fast kernel and run it on Discord. See how you compare against the best!☆46Updated this week
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆187Updated last year
- Code for the book "The Elements of Differentiable Programming".☆88Updated last week
- A hands-on introduction to tuning GPU kernels using Kernel Tuner https://github.com/KernelTuner/kernel_tuner/☆31Updated 2 months ago
- NVIDIA tools guide☆135Updated 5 months ago
- Experiment of using Tangent to autodiff triton☆79Updated last year