modal-labs / gpu-glossary
GPU documentation for humans
☆44Updated this week
Alternatives and similar repositories for gpu-glossary:
Users that are interested in gpu-glossary are comparing it to the libraries listed below
- Write a fast kernel and run it on Discord. See how you compare against the best!☆41Updated this week
- High-Performance SGEMM on CUDA devices☆90Updated 3 months ago
- LLM training in simple, raw C/CUDA☆92Updated 11 months ago
- Learning about CUDA by writing PTX code.☆128Updated last year
- Custom PTX Instruction Benchmark☆122Updated last month
- ☆27Updated 3 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆65Updated last month
- ☆51Updated last week
- extensible collectives library in triton☆85Updated 3 weeks ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆130Updated last year
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆40Updated last month
- ☆16Updated last week
- ☆13Updated last month
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆29Updated 3 weeks ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆274Updated last week
- ☆98Updated last month
- ScalarLM - a unified training and inference stack☆33Updated last week
- An experimental CPU backend for Triton☆105Updated 2 weeks ago
- Collection of kernels written in Triton language☆120Updated 3 weeks ago
- ring-attention experiments☆130Updated 6 months ago
- Fast low-bit matmul kernels in Triton☆291Updated this week
- Reference Kernels for the Leaderboard☆33Updated last week
- Custom kernels in Triton language for accelerating LLMs☆18Updated last year
- Cataloging released Triton kernels.☆217Updated 3 months ago
- Perplexity GPU Kernels☆251Updated this week
- Nvidia Instruction Set Specification Generator☆256Updated 9 months ago
- AI Tensor Engine for ROCm☆180Updated this week
- ☆200Updated this week
- Fastest kernels written from scratch☆236Updated 3 weeks ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆62Updated this week