modal-labs / gpu-glossaryLinks
GPU documentation for humans
☆414Updated 2 weeks ago
Alternatives and similar repositories for gpu-glossary
Users that are interested in gpu-glossary are comparing it to the libraries listed below
Sorting:
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆537Updated 2 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆244Updated 7 months ago
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆402Updated 3 weeks ago
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆164Updated last week
- Simple MPI implementation for prototyping or learning☆291Updated 4 months ago
- ☆85Updated 3 weeks ago
- ☆127Updated last month
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆408Updated this week
- An early research stage expert-parallel load balancer for MoE models based on linear programming.☆433Updated 2 weeks ago
- Perplexity GPU Kernels☆534Updated last month
- Learning about CUDA by writing PTX code.☆148Updated last year
- cuTile is a programming model for writing parallel kernels for NVIDIA GPUs☆523Updated this week
- Fastest kernels written from scratch☆405Updated 2 months ago
- torchcomms: a modern PyTorch communications API☆298Updated this week
- kernels, of the mega variety☆618Updated 2 months ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆438Updated 8 months ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆177Updated this week
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs☆710Updated last week
- Quantized LLM training in pure CUDA/C++.☆220Updated this week
- Complete solutions to the Programming Massively Parallel Processors Edition 4☆595Updated 5 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 8 months ago
- LeetGPU Challenges☆516Updated last week
- ☆257Updated this week
- Cataloging released Triton kernels.☆274Updated 2 months ago
- Hand-Rolled GPU communications library☆72Updated last week
- Fast low-bit matmul kernels in Triton☆402Updated 2 weeks ago
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆691Updated last week
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆658Updated this week
- Materials for learning SGLang☆667Updated last week
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆236Updated last week