NumPy and SciPy on Multi-Node Multi-GPU systems
☆967Mar 18, 2026Updated this week
Alternatives and similar repositories for cupynumeric
Users that are interested in cupynumeric are comparing it to the libraries listed below
Sorting:
- The Foundation for All Legate Libraries☆238Updated this week
- An Aspiring Drop-In Replacement for Pandas at Scale☆74Oct 19, 2021Updated 4 years ago
- Legate Hello World Pedagogical Library☆10Apr 5, 2023Updated 2 years ago
- Legate Sparse is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the scipy.sparse library on …☆24Feb 14, 2026Updated last month
- NumPy & SciPy for GPU☆10,847Updated this week
- The Legion Parallel Programming System☆752Dec 17, 2025Updated 3 months ago
- CUDA Python: Performance meets Productivity☆3,185Updated this week
- An efficient C++20 GPU numerical computing library with Python-like syntax☆1,406Updated this week
- A Python framework for accelerated simulation, data generation and spatial computing.☆6,326Updated this week
- The CUDA target for Numba☆263Updated this week
- NVIDIA Math Libraries for the Python Ecosystem☆553Mar 11, 2026Updated last week
- cuDF - GPU DataFrame Library☆9,558Updated this week
- cuML - RAPIDS Machine Learning Library☆5,148Updated this week
- A flyweight in situ visualization and analysis runtime for multi-physics HPC simulations☆236Updated this week
- Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more☆35,108Updated this week
- CUDA Core Compute Libraries☆2,217Updated this week
- Zero-copy MPI communication of JAX arrays, for turbo-charged HPC applications in Python☆519Mar 2, 2026Updated 2 weeks ago
- Development repository for the Triton language and compiler☆18,656Updated this week
- nanobind: tiny and efficient C++/Python bindings☆3,408Mar 10, 2026Updated last week
- CUDA Templates and Python DSLs for High-Performance Linear Algebra☆9,442Updated this week
- ☆627Mar 12, 2026Updated last week
- functorch is JAX-like composable function transforms for PyTorch.☆1,437Aug 21, 2025Updated 6 months ago
- KvikIO - High Performance File IO☆255Updated this week
- RAJA Performance Portability Layer (C++)☆570Updated this week
- [ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl☆2,308Feb 7, 2024Updated 2 years ago
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆518Updated this week
- A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).☆569Sep 15, 2025Updated 6 months ago
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆179Dec 16, 2025Updated 3 months ago
- RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-a…☆989Updated this week
- ⚡️Optimizing einsum functions in NumPy, Tensorflow, Dask, and more with contraction order optimization.☆974Updated this week
- Parallel solvers for sparse linear systems featuring multigrid methods.☆822Updated this week
- Microbenchmarks showing relative performance of different Python functions/patterns.☆13Oct 3, 2025Updated 5 months ago
- An MPI wrapper for the pytorch tensor library that is automatically differentiable☆10Mar 27, 2023Updated 2 years ago
- cuGraph - RAPIDS Graph Analytics Library☆2,143Updated this week
- Data and reproducibility scripts for the UoB-HPC Performance Portability studies☆18May 23, 2024Updated last year
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆874Sep 26, 2025Updated 5 months ago
- Extending JAX with custom C++ and CUDA code☆403Aug 18, 2024Updated last year
- GBM implementation on Legate☆14Jan 28, 2026Updated last month
- DaCe - Data Centric Parallel Programming☆579Mar 13, 2026Updated last week