NVIDIA / numba-cudaLinks
The CUDA target for Numba
☆128Updated this week
Alternatives and similar repositories for numba-cuda
Users that are interested in numba-cuda are comparing it to the libraries listed below
Sorting:
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆47Updated last week
- NVIDIA Math Libraries for the Python Ecosystem☆318Updated 2 months ago
- Data Parallel Extension for NumPy☆108Updated this week
- KvikIO - High Performance File IO☆208Updated this week
- The Foundation for All Legate Libraries☆217Updated this week
- Data Parallel Extension for Numba☆81Updated 6 months ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆60Updated last month
- Python SYCL bindings and SYCL-based Python Array API library☆112Updated last week
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆268Updated this week
- NPBench - A Benchmarking Suite for High-Performance NumPy☆81Updated 2 weeks ago
- LLM training in simple, raw C/CUDA☆99Updated last year
- Python bindings for UCX☆135Updated this week
- ☆31Updated this week
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆56Updated last month
- RFC document, tooling and other content related to the array API standard☆240Updated last week
- ☆35Updated this week
- Collection of scripts to build PyTorch and the domain libraries from source.☆11Updated this week
- OpenMP for Python in Numba☆106Updated last month
- ☆47Updated last week
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- Legate Sparse is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the scipy.sparse library on …☆21Updated this week
- Kernel Tuner☆337Updated this week
- POC work on MLIR backend☆55Updated 9 months ago
- Analyze graph/hierarchical performance data using pandas dataframes☆115Updated 3 months ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆331Updated this week
- High-Performance SGEMM on CUDA devices☆94Updated 4 months ago
- Exploring using stdpar and Cython☆33Updated 4 years ago
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆264Updated last week
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆206Updated 3 weeks ago
- NVIDIA curated collection of educational resources related to general purpose GPU programming.☆460Updated last week