NVIDIA/numba-cuda

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NVIDIA/numba-cuda)

NVIDIA / numba-cuda

The CUDA target for Numba

☆290

Alternatives and similar repositories for numba-cuda

Users that are interested in numba-cuda are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NVIDIA / numbast
View on GitHub
Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.
☆61Updated this week
NVIDIA / nvmath-python
View on GitHub
NVIDIA Math Libraries for the Python Ecosystem
☆589Updated this week
NVIDIA / cuda-python
View on GitHub
CUDA Python: Performance meets Productivity
☆3,320Updated this week
NVIDIA / cccl
View on GitHub
CUDA Core Compute Libraries
☆2,435Updated this week
gmarkall / life-of-a-numba-kernel
View on GitHub
Worked example of the process from Python source to CUDA kernel execution with Numba
☆45Sep 11, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆34Jul 29, 2025Updated 11 months ago
NVIDIA / cuEmbed
View on GitHub
CUDA Embedding Lookup Kernel Library
☆48Jun 26, 2026Updated 3 weeks ago
numba / pixie
View on GitHub
Creates performance portable libraries with embedded source representations.
☆31Dec 16, 2024Updated last year
ROCm / numba-hip
View on GitHub
HIP backend patch for Numba, the NumPy aware dynamic Python compiler using LLVM.
☆22Jul 10, 2026Updated last week
NVIDIA / accelerated-computing-hub
View on GitHub
NVIDIA curated collection of educational resources related to general purpose GPU programming.
☆1,847Updated this week
vortexgpgpu / Volt
View on GitHub
☆17Feb 9, 2026Updated 5 months ago
NVIDIA / cutile-python
View on GitHub
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
☆2,121Updated this week
NVIDIA / NVTX
View on GitHub
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…
☆547Updated this week
NVIDIA / cuCascade
View on GitHub
GPU Memory Reservation Library
☆55Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
kokkos / pykokkos
View on GitHub
Performance portable parallel programming in Python backed by Kokkos
☆127Updated this week
helmholtz-analytics / mpi4torch
View on GitHub
An MPI wrapper for the pytorch tensor library that is automatically differentiable
☆10Mar 27, 2023Updated 3 years ago
NVIDIA / tilus
View on GitHub
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
☆489Jul 5, 2026Updated 2 weeks ago
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
mrocklin / dask-array
View on GitHub
☆17Updated this week
tpapp / LogDensityProblemsAD.jl
View on GitHub
AD backends for LogDensityProblems.jl.
☆13Jul 1, 2026Updated 2 weeks ago
NVIDIA / cuopt
View on GitHub
GPU accelerated decision optimization
☆978Updated this week
IntelPython / numba-dpex
View on GitHub
Data Parallel Extension for Numba
☆89Sep 26, 2025Updated 9 months ago
microsoft / TileFusion
View on GitHub
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆115Jun 28, 2025Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
NVIDIA / nvshmem
View on GitHub
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…
☆560Updated this week
nv-legate / cupynumeric
View on GitHub
NumPy and SciPy on Multi-Node Multi-GPU systems
☆980Updated this week
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,104Updated this week
HDFGroup / vol-async
View on GitHub
Asynchronous I/O for HDF5
☆24Feb 10, 2026Updated 5 months ago
numba / numba-mlir
View on GitHub
POC work on MLIR backend
☆61Aug 21, 2024Updated last year
rapidsai / rapids-cmake
View on GitHub
☆47Updated this week
meta-pytorch / tlparse
View on GitHub
TORCH_TRACE parser for PT2
☆90May 11, 2026Updated 2 months ago
foundation-model-stack / vllm-triton-backend
View on GitHub
A Triton-only attention backend for vLLM
☆27Updated this week
toyaix / triton-runner
View on GitHub
Multi-Level Triton Runner supporting Python, IR, PTX, AMDGCN, cubin and hasco.
☆98May 8, 2026Updated 2 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
feifeibear / ChituAttention
View on GitHub
Quantized Attention on GPU
☆45Nov 22, 2024Updated last year
nv-legate / legate
View on GitHub
The Foundation for All Legate Libraries
☆241Updated this week
NVIDIA / cuCollections
View on GitHub
☆655Updated this week
NVIDIA / TileGym
View on GitHub
Helpful kernel tutorials, examples and SKILLs for tile-based GPU programming
☆776Updated this week
ROCm / tritonBLAS
View on GitHub
A lightweight triton-based General Matrix Multiplication (GEMM) library.
☆65Jun 13, 2026Updated last month
Olympus-HPC / proteus
View on GitHub
Programmable JIT Compilation and Optimization for C/C++ using LLVM
☆52Updated this week
llnl / H5Z-ZFP
View on GitHub
A registered ZFP compression plugin for HDF5
☆55Updated this week