NVIDIA / accelerated-computing-hubLinks
NVIDIA curated collection of educational resources related to general purpose GPU programming.
☆460Updated last week
Alternatives and similar repositories for accelerated-computing-hub
Users that are interested in accelerated-computing-hub are comparing it to the libraries listed below
Sorting:
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆268Updated this week
- NVIDIA Math Libraries for the Python Ecosystem☆318Updated 2 months ago
- Step-by-step optimization of CUDA SGEMM☆333Updated 3 years ago
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆45Updated last month
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆181Updated 3 weeks ago
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆718Updated 3 months ago
- CUDA Kernel Benchmarking Library☆650Updated last week
- Experimental projects related to TensorRT☆105Updated this week
- CUDA Matrix Multiplication Optimization☆188Updated 10 months ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆331Updated this week
- Fastest kernels written from scratch☆269Updated 2 months ago
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆388Updated last week
- The CUDA target for Numba☆128Updated this week
- ☆105Updated 2 months ago
- NVIDIA tools guide☆133Updated 4 months ago
- Fast CUDA matrix multiplication from scratch☆730Updated last year
- Kernel Tuner☆337Updated last week
- Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)☆775Updated 9 months ago
- ☆158Updated 10 months ago
- Cataloging released Triton kernels.☆226Updated 4 months ago
- collection of benchmarks to measure basic GPU capabilities☆376Updated 3 months ago
- LLM training in simple, raw C/CUDA☆99Updated last year
- ☆538Updated last week
- ☆215Updated this week
- Applied AI experiments and examples for PyTorch☆271Updated last week
- AI Tensor Engine for ROCm☆201Updated this week
- ☆157Updated last year
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆401Updated this week
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆233Updated 8 months ago
- Perplexity GPU Kernels☆324Updated 2 weeks ago