gpu-mode / resource-streamLinks

GPU programming related news and material links

☆1,642

Alternatives and similar repositories for resource-stream

Users that are interested in resource-stream are comparing it to the libraries listed below

Sorting:

srush / Triton-Puzzles
Puzzles for learning Triton
☆1,801Updated 8 months ago
gpu-mode / awesomeMLSys
An ML Systems Onboarding list
☆849Updated 6 months ago
HazyResearch / ThunderKittens
Tile primitives for speedy kernels
☆2,541Updated this week
siboehm / SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
☆782Updated last year
gpu-mode / lectures
Material for gpu-mode lectures
☆4,794Updated last month
tspeterkim / flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
☆887Updated 7 months ago
srush / LLM-Training-Puzzles
What would you do with 1000 H100s...
☆1,068Updated last year
huggingface / picotron
Minimalistic 4D-parallelism distributed training framework for education purpose
☆1,619Updated 3 weeks ago
olcf / cuda-training-series
Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)
☆825Updated 11 months ago
HazyResearch / aisys-building-blocks
Building blocks for foundation models.
☆519Updated last year
R100001 / Programming-Massively-Parallel-Processors
☆173Updated 11 months ago
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆563Updated 2 weeks ago
CisMine / Parallel-Computing-Cuda-C
CUDA Learning guide
☆414Updated last year
clu0 / unet.cu
UNet diffusion model in pure CUDA
☆612Updated last year
mirage-project / mirage
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
☆1,629Updated this week
rkinas / triton-resources
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
☆382Updated 4 months ago
tile-ai / tilelang
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
☆1,472Updated this week
rwitten / HighPerfLLMs2024
☆516Updated last year
Lightning-AI / lightning-thunder
PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily wri…
☆1,384Updated this week
pytorch / ao
PyTorch native quantization and sparsity for training and inference
☆2,219Updated this week
a-hamdi / GPU
100 days of building GPU kernels!
☆470Updated 3 months ago
MekkCyber / CutlassAcademy
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
☆203Updated 2 months ago
Infatoshi / cuda-course
☆1,309Updated last month
mlops-discord / gpu-optimization-workshop
Slides, notes, and materials for the workshop
☆328Updated last year
flashinfer-ai / flashinfer
FlashInfer: Kernel Library for LLM Serving
☆3,448Updated this week
gpu-mode / profiling-cuda-in-torch
☆162Updated last year
wangzyon / NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
☆362Updated 3 years ago
NVIDIA / TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Bla…
☆2,587Updated this week
EleutherAI / cookbook
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
☆808Updated 2 weeks ago
linjames0 / Transformer-CUDA
An implementation of the transformer architecture onto an Nvidia CUDA kernel
☆189Updated last year