NVIDIA / cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
☆7,157Updated last week
Alternatives and similar repositories for cuda-samples:
Users that are interested in cuda-samples are comparing it to the libraries listed below
- CUDA Library Samples☆1,838Updated this week
- [ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl☆1,735Updated last year
- CUDA Templates for Linear Algebra Subroutines☆7,150Updated this week
- CUDA Core Compute Libraries☆1,539Updated this week
- Learn CUDA Programming, published by Packt☆1,120Updated last year
- C++ implementation of the Python Numpy library☆3,766Updated last month
- Optimized primitives for collective multi-GPU communication☆3,564Updated this week
- Sample codes for my CUDA programming book☆1,669Updated last month
- [ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl☆4,953Updated last year
- Source code examples from the Parallel Forall Blog☆1,269Updated 7 months ago
- Code from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch☆814Updated last year
- ☆2,366Updated last year
- [ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl☆2,299Updated last year
- This is a Chinese translation of the CUDA programming guide☆1,463Updated 4 months ago
- Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)☆721Updated 7 months ago
- HIP: C++ Heterogeneous-Compute Interface for Portability☆3,935Updated this week
- CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.☆2,466Updated this week
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆657Updated last month
- An efficient C++17 GPU numerical computing library with Python-like syntax☆1,296Updated this week
- 📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).☆2,901Updated last week
- NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source compone…☆11,350Updated last week
- Lightning fast C++/CUDA neural network framework☆3,928Updated last month
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs…☆2,293Updated this week
- CUDA Python: Performance meets Productivity☆1,212Updated this week
- ☆427Updated 9 years ago
- ☆1,841Updated last year
- A machine learning compiler for GPUs, CPUs, and ML accelerators☆3,037Updated this week
- CUDA Kernel Benchmarking Library☆593Updated last week
- how to optimize some algorithm in cuda.☆2,022Updated this week
- An Open Source Machine Learning Framework for Everyone☆1,108Updated 5 months ago