Jimver / cuda-toolkit
GitHub Action to install CUDA
☆160Updated this week
Alternatives and similar repositories for cuda-toolkit:
Users that are interested in cuda-toolkit are comparing it to the libraries listed below
- ☆58Updated 5 months ago
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆332Updated last week
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆32Updated this week
- CUDA Kernel Benchmarking Library☆550Updated 2 months ago
- A next generation Python CMake adaptor and Python API for plugins☆265Updated this week
- An example combining scikit-build and pybind11☆117Updated this week
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆240Updated this week
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆262Updated 2 weeks ago
- A nanobind example project☆97Updated 2 weeks ago
- Pybind11 tool for making docstrings from C++ comments☆40Updated 9 months ago
- Training material for Nsight developer tools☆143Updated 5 months ago
- NVIDIA Math Libraries for the Python Ecosystem☆222Updated last month
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆55Updated 4 months ago
- An extension library of WMMA API (Tensor Core API)☆87Updated 6 months ago
- ☆244Updated this week
- Data Parallel Extension for NumPy☆101Updated this week
- ☆506Updated last week
- ☆36Updated 2 months ago
- ☆214Updated last week
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆104Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆294Updated this week
- AMD’s C++ library for accelerating tensor primitives☆38Updated this week
- manylinux docker images with CUDA Toolkit☆10Updated last month
- Bandwidth test for ROCm☆53Updated this week
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).☆520Updated 8 months ago
- AMD SMI☆49Updated this week
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆178Updated last month
- KvikIO - High Performance File IO☆176Updated this week
- The Foundation for All Legate Libraries☆202Updated last month