NVIDIA / cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
☆6,087Updated last month
Related projects: ⓘ
- CUDA Library Samples☆1,519Updated last week
- CUDA Templates for Linear Algebra Subroutines☆5,359Updated this week
- [ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl☆4,907Updated 7 months ago
- [ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl☆1,669Updated 11 months ago
- CUDA Core Compute Libraries☆1,132Updated this week
- ☆2,104Updated 8 months ago
- NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source compone…☆10,552Updated last week
- Optimized primitives for collective multi-GPU communication☆3,132Updated this week
- oneAPI Deep Neural Network Library (oneDNN)☆3,579Updated this week
- ☆1,725Updated last year
- [ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl☆2,293Updated 7 months ago
- Sample codes for my CUDA programming book☆1,524Updated last year
- Learn CUDA Programming, published by Packt☆987Updated 8 months ago
- CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.☆2,325Updated 2 weeks ago
- Development repository for the Triton language and compiler☆12,698Updated this week
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆11,602Updated this week
- Source code examples from the Parallel Forall Blog☆1,223Updated last month
- ArrayFire: a general purpose GPU library.☆4,525Updated last week
- A machine learning compiler for GPUs, CPUs, and ML accelerators☆2,577Updated this week
- Transformer related optimization, including BERT, GPT☆5,773Updated 5 months ago
- Seamless operability between C++11 and Python☆15,425Updated this week
- NumPy & SciPy for GPU☆8,124Updated this week
- C++ implementation of the Python Numpy library☆3,520Updated 9 months ago
- The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologi…☆2,800Updated 3 weeks ago
- oneAPI Threading Building Blocks (oneTBB)☆5,603Updated this week
- Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)☆541Updated last month
- Open MPI main development repository☆2,123Updated this week
- OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.☆6,281Updated this week
- HIP: C++ Heterogeneous-Compute Interface for Portability☆3,690Updated this week
- a language for fast, portable data-parallel computation☆5,835Updated this week