gau-nernst / learn-cudaLinks

Learn CUDA with PyTorch

☆92

Alternatives and similar repositories for learn-cuda

Users that are interested in learn-cuda are comparing it to the libraries listed below

Sorting:

gpu-mode / triton-index
Cataloging released Triton kernels.
☆263Updated last month
gpu-mode / ring-attention
ring-attention experiments
☆154Updated last year
siboehm / ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
☆146Updated 2 years ago
MekkCyber / CutlassAcademy
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
☆233Updated 5 months ago
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆193Updated 4 months ago
Deep-Learning-Profiling-Tools / triton-viz
☆242Updated this week
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆385Updated last week
gpu-mode / profiling-cuda-in-torch
☆174Updated last year
huggingface / kernels
Load compute kernels from the Hub
☆304Updated last week
open-lm-engine / flash-model-architectures
A bunch of kernels that might make stuff slower 😉
☆62Updated this week
hkproj / triton-flash-attention
☆209Updated 9 months ago
meta-pytorch / applied-ai
Applied AI experiments and examples for PyTorch
☆299Updated 2 months ago
tspeterkim / paged-attention-minimal
a minimal cache manager for PagedAttention, on top of llama3.
☆124Updated last year
rkinas / triton-resources
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
☆421Updated 7 months ago
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆157Updated 6 months ago
meta-pytorch / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆264Updated this week
cloneofsimo / ptx-tutorial-by-aislop
PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)
☆66Updated 7 months ago
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆223Updated last year
vdesai2014 / inference-optimization-blog-post
☆89Updated last year
gau-nernst / quantized-training
Explore training for quantized models
☆25Updated 3 months ago
evintunador / triton_docs_tutorials
making the official triton tutorials actually comprehensible
☆57Updated 2 months ago
cchan / tccl
extensible collectives library in triton
☆90Updated 6 months ago
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆580Updated 2 months ago
lessw2020 / triton_kernels_for_fun_and_profit
Custom kernels in Triton language for accelerating LLMs
☆26Updated last year
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆107Updated 9 months ago
gpu-mode / reference-kernels
Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!
☆99Updated last week
meta-pytorch / tritonparse
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
☆164Updated this week
IST-DASLab / qutlass
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
☆119Updated 3 weeks ago
pranjalssh / fast.cu
Fastest kernels written from scratch
☆377Updated last month
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆58Updated 2 weeks ago