NVIDIA / numba-cudaLinks
The CUDA target for Numba
☆210Updated this week
Alternatives and similar repositories for numba-cuda
Users that are interested in numba-cuda are comparing it to the libraries listed below
Sorting:
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆52Updated this week
- NVIDIA Math Libraries for the Python Ecosystem☆532Updated 2 months ago
- The Foundation for All Legate Libraries☆232Updated last week
- Data Parallel Extension for Numba☆87Updated last month
- Data Parallel Extension for NumPy☆118Updated this week
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆306Updated 2 weeks ago
- KvikIO - High Performance File IO☆231Updated last week
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆68Updated 7 months ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆362Updated this week
- Python SYCL bindings and SYCL-based Python Array API library☆117Updated last week
- ☆47Updated last week
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆470Updated 2 weeks ago
- ☆61Updated this week
- ☆51Updated 2 weeks ago
- NPBench - A Benchmarking Suite for High-Performance NumPy☆89Updated last month
- LLM training in simple, raw C/CUDA☆108Updated last year
- Python bindings for UCX☆140Updated last month
- POC work on MLIR backend☆61Updated last year
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆63Updated 3 weeks ago
- Kernel Tuner☆372Updated last week
- GitHub Action to install CUDA☆192Updated 2 weeks ago
- ☆80Updated last week
- HIP Python Low-level Bindings☆30Updated last week
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆211Updated 2 weeks ago
- RAPIDS Memory Manager☆656Updated this week
- High-Performance SGEMM on CUDA devices☆110Updated 9 months ago
- A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python☆333Updated last year
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆74Updated last month
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆321Updated this week
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆104Updated last week