modal-labs / gpu-glossaryLinks
GPU documentation for humans
☆518Updated 2 weeks ago
Alternatives and similar repositories for gpu-glossary
Users that are interested in gpu-glossary are comparing it to the libraries listed below
Sorting:
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆551Updated 4 months ago
- Simple MPI implementation for prototyping or learning☆300Updated 6 months ago
- Complete solutions to the Programming Massively Parallel Processors Edition 4☆658Updated 7 months ago
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆201Updated this week
- ☆130Updated 3 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆251Updated 9 months ago
- Fastest kernels written from scratch☆533Updated 4 months ago
- Helpful kernel tutorials and examples for tile-based GPU programming☆630Updated last week
- Quantized LLM training in pure CUDA/C++.☆238Updated 3 weeks ago
- Learning about CUDA by writing PTX code.☆152Updated last year
- CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-base…☆823Updated 3 weeks ago
- Learn CUDA with PyTorch☆200Updated this week
- ☆89Updated 3 months ago
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆462Updated last month
- Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs☆843Updated 2 weeks ago
- An early research stage expert-parallel load balancer for MoE models based on linear programming.☆496Updated 2 months ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆457Updated 11 months ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆194Updated last week
- kernels, of the mega variety☆672Updated 2 weeks ago
- Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.☆441Updated last week
- Our first fully AI generated deep learning system☆481Updated last week
- Competitive GPU kernel optimization platform.☆153Updated this week
- Perplexity GPU Kernels☆560Updated 3 months ago
- Fast and Furious AMD Kernels☆350Updated 2 weeks ago
- ☆288Updated this week
- A curriculum for learning about gpu performance engineering, from scratch to what the frontier AI labs do☆349Updated last month
- coding CUDA everyday!☆73Updated last week
- Hand-Rolled GPU communications library☆82Updated 2 months ago
- High-Performance FP32 GEMM on CUDA devices☆117Updated last year
- ☆415Updated 10 months ago