modal-labs / gpu-glossary
GPU documentation for humans
☆51Updated this week
Alternatives and similar repositories for gpu-glossary
Users that are interested in gpu-glossary are comparing it to the libraries listed below
Sorting:
- Write a fast kernel and run it on Discord. See how you compare against the best!☆44Updated this week
- High-Performance SGEMM on CUDA devices☆91Updated 3 months ago
- Learning about CUDA by writing PTX code.☆129Updated last year
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated last month
- ☆52Updated last week
- Perplexity GPU Kernels☆289Updated this week
- Custom PTX Instruction Benchmark☆123Updated 2 months ago
- Reference Kernels for the Leaderboard☆45Updated this week
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆173Updated last week
- extensible collectives library in triton☆86Updated last month
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆132Updated last year
- Fast low-bit matmul kernels in Triton☆299Updated this week
- Cataloging released Triton kernels.☆221Updated 4 months ago
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆180Updated this week
- Nvidia Instruction Set Specification Generator☆260Updated 10 months ago
- LLM training in simple, raw C/CUDA☆95Updated last year
- Fastest kernels written from scratch☆261Updated last month
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆324Updated last week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆40Updated last month
- AI Tensor Engine for ROCm☆195Updated this week
- making the official triton tutorials actually comprehensible☆30Updated last month
- ☆204Updated 3 weeks ago
- ☆79Updated 6 months ago
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆40Updated 3 weeks ago
- Collection of kernels written in Triton language☆122Updated last month
- Cray-LM unified training and inference stack.☆22Updated 3 months ago
- ☆102Updated last month
- NanoGPT-speedrunning for the poor T4 enjoyers☆65Updated 3 weeks ago
- Applied AI experiments and examples for PyTorch☆265Updated 2 weeks ago
- Where GPUs get cooked 👩🍳🔥☆229Updated 2 months ago