R100001 / Programming-Massively-Parallel-ProcessorsLinks

☆193

Alternatives and similar repositories for Programming-Massively-Parallel-Processors

Users that are interested in Programming-Massively-Parallel-Processors are comparing it to the libraries listed below

Sorting:

siboehm / SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
☆908Updated last month
leimao / CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
☆230Updated last year
wangzyon / NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
☆388Updated 3 years ago
nvixnu / pmpp__programming_massively_parallel_processors
Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…
☆75Updated 4 years ago
pranjalssh / fast.cu
Fastest kernels written from scratch
☆377Updated last month
MekkCyber / CutlassAcademy
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
☆233Updated 5 months ago
bertmaher / simplegemm
☆121Updated 7 months ago
gpu-mode / triton-index
Cataloging released Triton kernels.
☆263Updated last month
olcf / cuda-training-series
Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)
☆877Updated last year
CisMine / Guide-NVIDIA-Tools
NVIDIA tools guide
☆143Updated 9 months ago
Deep-Learning-Profiling-Tools / triton-viz
☆240Updated last week
gpu-mode / resource-stream
GPU programming related news and material links
☆1,746Updated last month
leimao / CUTLASS-Examples
CUTLASS and CuTe Examples
☆91Updated last week
Bruce-Lee-LY / cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…
☆485Updated last year
KnowingNothing / MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
☆385Updated 2 weeks ago
CisMine / Parallel-Computing-Cuda-C
CUDA Learning guide
☆455Updated last year
AlphaGPU / leetgpu-challenges
LeetGPU Challenges
☆299Updated last week
Jokeren / Awesome-GPU
Awesome resources for GPUs
☆599Updated 2 years ago
guanrenyang / Programming-Massively-Parallel-Processors
Solution of Programming Massively Parallel Processors
☆50Updated last year
ColfaxResearch / cutlass-kernels
☆241Updated last year
rkinas / triton-resources
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
☆421Updated 7 months ago
gpu-mode / reference-kernels
Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!
☆99Updated last week
mit-han-lab / parallel-computing-tutorial
☆174Updated 2 years ago
ColfaxResearch / cfx-article-src
☆150Updated 5 months ago
66RING / tiny-flash-attention
flash attention tutorial written in python, triton, cuda, cutlass
☆428Updated 5 months ago
yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
☆385Updated 9 months ago
CodedK / CUDA-by-Example-source-code-for-the-book-s-examples-
CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. …
☆450Updated 2 years ago
DD-DuDa / Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
☆241Updated 3 months ago
tspeterkim / flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
☆953Updated 9 months ago
h3ct0rjs / HighPerformanceComputing
Class of High Performance Computing taken at U.T.P 2017
☆84Updated 8 years ago