leo811121 / UIUC-CS-483-Parallel-ProgrammingLinks

☆20

Alternatives and similar repositories for UIUC-CS-483-Parallel-Programming

Users that are interested in UIUC-CS-483-Parallel-Programming are comparing it to the libraries listed below

Sorting:

gpu-mode / reference-kernels
Reference Kernels for the Leaderboard
☆61Updated last week
MekkCyber / CutlassAcademy
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
☆190Updated last month
gpu-mode / profiling-cuda-in-torch
☆160Updated last year
nvixnu / pmpp__programming_massively_parallel_processors
Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…
☆70Updated 4 years ago
SzymonOzog / FastSoftmax
☆40Updated 5 months ago
gpu-mode / triton-index
Cataloging released Triton kernels.
☆240Updated 5 months ago
gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆99Updated last year
Deep-Learning-Profiling-Tools / triton-viz
☆221Updated this week
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆134Updated 2 months ago
mit-han-lab / parallel-computing-tutorial
☆170Updated last year
gpu-mode / ring-attention
ring-attention experiments
☆144Updated 8 months ago
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆46Updated this week
alexzhang13 / Triton-Puzzles-Solutions
Personal solutions to the Triton Puzzles
☆19Updated 11 months ago
MDK8888 / vllmini
A minimal implementation of vllm.
☆44Updated 11 months ago
nod-ai / techtalks
☆16Updated last year
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆133Updated 2 weeks ago
stas00 / ml-ways
ML/DL Math and Method notes
☆61Updated last year
SzymonOzog / GPU_Programming
☆61Updated this week
gau-nernst / learn-cuda
Learn CUDA with PyTorch
☆27Updated last week
dlsyscourse / public_notebooks
☆58Updated 7 months ago
hkproj / triton-flash-attention
☆176Updated 5 months ago
mobiusml / gemlite
Fast low-bit matmul kernels in Triton
☆323Updated last week
dlsyscourse / hw0
☆36Updated last year
cloneofsimo / ptx-tutorial-by-aislop
PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)
☆66Updated 3 months ago
ademeure / DeeperGEMM
DeeperGEMM: crazy optimized version
☆69Updated last month
Leo-Adventure / CS182-Berkeley-Deep-Learning-23Spring
This repository contains materials from the author's deep learning course at UC Berkeley lectured by Prof. Sahai, including coursework, a…
☆34Updated 2 years ago
gau-nernst / quantized-training
Explore training for quantized models
☆18Updated last week
Doraemonzzz / Awesome-Triton-Resources
Awesome Triton Resources
☆31Updated 2 months ago
hao-ai-lab / cse234-w25-PA
☆33Updated 3 months ago
wangzyon / NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
☆344Updated 3 years ago