NVIDIA / numba-cudaLinks
The CUDA target for Numba
☆158Updated this week
Alternatives and similar repositories for numba-cuda
Users that are interested in numba-cuda are comparing it to the libraries listed below
Sorting:
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆48Updated this week
- NVIDIA Math Libraries for the Python Ecosystem☆336Updated 3 weeks ago
- The Foundation for All Legate Libraries☆219Updated this week
- Data Parallel Extension for NumPy☆109Updated this week
- Data Parallel Extension for Numba☆82Updated 8 months ago
- KvikIO - High Performance File IO☆220Updated this week
- Python SYCL bindings and SYCL-based Python Array API library☆116Updated this week
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆63Updated 3 months ago
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆284Updated last week
- NPBench - A Benchmarking Suite for High-Performance NumPy☆86Updated 2 months ago
- POC work on MLIR backend☆56Updated 11 months ago
- LLM training in simple, raw C/CUDA☆102Updated last year
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆345Updated this week
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆427Updated last week
- Collection of scripts to build PyTorch and the domain libraries from source.☆12Updated 2 weeks ago
- ☆32Updated this week
- ☆65Updated last week
- GitHub Action to install CUDA☆182Updated last week
- ☆49Updated 2 months ago
- Python bindings for UCX☆137Updated last week
- Legate Sparse is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the scipy.sparse library on …☆23Updated this week
- HIP Python Low-level Bindings☆29Updated 2 months ago
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆58Updated 2 weeks ago
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆287Updated last month
- RFC document, tooling and other content related to the array API standard☆242Updated last month
- Kernel Tuner☆356Updated last week
- ☆36Updated this week
- High-Performance SGEMM on CUDA devices☆98Updated 6 months ago
- ☆49Updated this week
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆77Updated 4 months ago