modal-labs / gpu-glossaryLinks

GPU documentation for humans

☆337

Alternatives and similar repositories for gpu-glossary

Users that are interested in gpu-glossary are comparing it to the libraries listed below

Sorting:

andrewkchan / yalm
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
☆514Updated last month
Quentin-Anthony / nanoMPI
Simple MPI implementation for prototyping or learning
☆284Updated 2 months ago
unixpickle / learn-ptx
Learning about CUDA by writing PTX code.
☆143Updated last year
SzymonOzog / GPU_Programming
☆79Updated 3 weeks ago
NVIDIA / nvshmem
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…
☆338Updated last week
IST-DASLab / llmq
Quantized LLM training in pure CUDA/C++.
☆198Updated last week
meta-pytorch / tritonparse
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
☆155Updated last week
rkinas / triton-resources
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
☆421Updated 7 months ago
MekkCyber / CutlassAcademy
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
☆232Updated 5 months ago
perplexityai / pplx-kernels
Perplexity GPU Kernels
☆488Updated 3 weeks ago
gpu-mode / reference-kernels
Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!
☆99Updated this week
HazyResearch / Megakernels
kernels, of the mega variety
☆586Updated 2 weeks ago
bertmaher / simplegemm
☆120Updated 7 months ago
NVIDIA / tilus
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
☆377Updated last week
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆107Updated 8 months ago
Maharshi-Pandya / cudacodes
Learnings and programs related to CUDA
☆420Updated 3 months ago
tugot17 / pmpp
Complete solutions to the Programming Massively Parallel Processors Edition 4
☆547Updated 3 months ago
jax-ml / scaling-book
Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs
☆655Updated last week
pranjalssh / fast.cu
Fastest kernels written from scratch
☆374Updated 3 weeks ago
andrewkchan / deepseek.cpp
CPU inference for the DeepSeek family of large language models in C++
☆313Updated 2 weeks ago
cloneofsimo / ptx-tutorial-by-aislop
PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)
☆66Updated 6 months ago
bytedance / InfiniStore
KV cache store for distributed LLM inference
☆341Updated last month
naklecha / llm-inference-optimizations-explained
in this repository, i'm going to implement increasingly complex llm inference optimizations
☆68Updated 4 months ago
gpu-mode / triton-index
Cataloging released Triton kernels.
☆261Updated last month
HenryNdubuaku / cuda-tutorials
CUDA tutorials for Maths & ML tutorials with examples, covers multi-gpus, fused attention, winograd convolution, reinforcement learning.
☆196Updated 4 months ago
tgautam03 / xGeMM
Accelerated General (FP32) Matrix Multiplication from scratch in CUDA
☆161Updated 9 months ago
Deep-Learning-Profiling-Tools / triton-viz
☆242Updated last week
gpu-mode / awesomeMLSys
An ML Systems Onboarding list
☆914Updated 8 months ago
simon-mo / vLLM-Benchmark
☆31Updated 5 months ago
gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆105Updated last year