Jokeren/Awesome-GPU

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Jokeren/Awesome-GPU)

Jokeren / Awesome-GPU

Awesome resources for GPUs

☆636

Alternatives and similar repositories for Awesome-GPU

Users that are interested in Awesome-GPU are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

GVProf / GVProf
View on GitHub
GVProf: A Value Profiler for GPU-based Clusters
☆54Mar 24, 2024Updated 2 years ago
Lin-Mao / DrGPUM
View on GitHub
A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.
☆36May 30, 2026Updated last month
merrymercy / awesome-tensor-compilers
View on GitHub
A list of awesome compiler projects and papers for tensor computation and deep learning.
☆2,768Oct 19, 2024Updated last year
cloudcores / CuAssembler
View on GitHub
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆609Apr 20, 2023Updated 3 years ago
sjfeng1999 / gpu-arch-microbenchmark
View on GitHub
Dissecting NVIDIA GPU Architecture
☆126Jul 11, 2022Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Jokeren / GPA
View on GitHub
GPU Performance Advisor
☆66Jul 25, 2022Updated 4 years ago
Erkaman / Awesome-CUDA
View on GitHub
This is a list of useful libraries and resources for CUDA development.
☆622Oct 8, 2017Updated 8 years ago
RRZE-HPC / gpu-benches
View on GitHub
collection of benchmarks to measure basic GPU capabilities
☆530Oct 24, 2025Updated 9 months ago
microsoft / nnfusion
View on GitHub
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
☆1,002Sep 19, 2024Updated last year
uwsampl / SparseTIR
View on GitHub
SparseTIR: Sparse Tensor Compiler for Deep Learning
☆145Mar 31, 2023Updated 3 years ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,498Updated this week
hkust-adsl / gass
View on GitHub
☆43Apr 3, 2022Updated 4 years ago
NVIDIA / multi-gpu-programming-models
View on GitHub
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
☆909Sep 26, 2025Updated 10 months ago
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,125Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
NVlabs / NVBit
View on GitHub
☆342Apr 6, 2026Updated 3 months ago
eth-cscs / Tiled-MM
View on GitHub
Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.
☆33Apr 2, 2025Updated last year
HazyResearch / ThunderKittens
View on GitHub
Tile primitives for speedy kernels
☆3,563Jul 13, 2026Updated last week
NVIDIA / nvbench
View on GitHub
CUDA Kernel Benchmarking Library
☆910Updated this week
CMU-SAFARI / Mosaic
View on GitHub
Source code of the simulator used in the Mosaic paper from MICRO 2017: "Mosaic: A GPU Memory Manager with Application-Transparent Support…
☆49Aug 21, 2018Updated 7 years ago
Cjkkkk / CUDA_gemm
View on GitHub
A simple high performance CUDA GEMM implementation.
☆437Jan 4, 2024Updated 2 years ago
KnowingNothing / MatmulTutorial
View on GitHub
A Easy-to-understand TensorOp Matmul Tutorial
☆445Mar 5, 2026Updated 4 months ago
DebashisGanguly / gpgpu-sim_UVMSmart
View on GitHub
☆83Nov 16, 2020Updated 5 years ago
mit-han-lab / inter-operator-scheduler
View on GitHub
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
☆201Apr 27, 2022Updated 4 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
microsoft / mscclpp
View on GitHub
MSCCL++: A GPU-driven communication stack for scalable AI applications
☆542Updated this week
meta-pytorch / tritonbench
View on GitHub
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆362Updated this week
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆1,070Updated this week
AmadeusChan / Awesome-LLM-System-Papers
View on GitHub
☆646Jan 14, 2026Updated 6 months ago
jslee02 / awesome-gpgpu
View on GitHub
A curated list of awesome GPGPU (CUDA/OpenCL/Vulkan) resources
☆110May 11, 2026Updated 2 months ago
triton-lang / triton
View on GitHub
Development repository for the Triton language and compiler
☆19,782Updated this week
daadaada / turingas
View on GitHub
Assembler for NVIDIA Volta and Turing GPUs
☆246Jan 13, 2022Updated 4 years ago
NervanaSystems / maxas
View on GitHub
Assembler for NVIDIA Maxwell architecture
☆1,074Jan 3, 2023Updated 3 years ago
sderek / CUDAAdvisor
View on GitHub
CUDAAdvisor: a GPU profiling tool
☆53Aug 24, 2018Updated 7 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
accel-sim / accel-sim-framework
View on GitHub
This is the top-level repository for the Accel-Sim framework.
☆630Mar 24, 2026Updated 4 months ago
NVIDIA / Fuser
View on GitHub
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
☆396May 31, 2026Updated last month
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,345Aug 28, 2025Updated 10 months ago
sunlex0717 / DissectingTensorCores
View on GitHub
☆114Apr 19, 2024Updated 2 years ago
antgroup / glake
View on GitHub
GLake: optimizing GPU memory management and IO transmission.
☆501Mar 24, 2025Updated last year
microsoft / triton-shared
View on GitHub
Shared Middle-Layer for Triton Compilation
☆340Dec 5, 2025Updated 7 months ago
NVIDIA / TransformerEngine
View on GitHub
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…
☆3,448Updated this week