NVIDIA / numba-cuda
☆35Updated this week
Related projects ⓘ
Alternatives and complementary repositories for numba-cuda
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆20Updated this week
- NPBench - A Benchmarking Suite for High-Performance NumPy☆73Updated 4 months ago
- Graph-indexed Pandas DataFrames for analyzing hierarchical performance data☆29Updated last week
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- Analyze graph/hierarchical performance data using pandas dataframes☆107Updated 2 weeks ago
- Data Parallel Extension for Numba☆77Updated last week
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆44Updated 3 weeks ago
- Deploy Dask using MPI4Py☆52Updated last month
- POC work on MLIR backend☆48Updated 2 months ago
- OpenMP Offloading Validation & Verification Suite; Official repository. We have migrated from bitbucket!! For documentation, results, pub…☆54Updated this week
- Python bindings for OpenSHMEM☆14Updated 3 weeks ago
- MPI accelerator-integrated communication extensions☆32Updated last year
- Creates performance portable libraries with embedded source representations.☆20Updated 4 months ago
- Data Parallel Extension for NumPy☆99Updated this week
- GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.☆34Updated last month
- An Aspiring Drop-In Replacement for Pandas at Scale☆73Updated 3 years ago
- A unified framework across multiple programming platforms☆33Updated 4 months ago
- Analyze parallel execution traces using pandas dataframes☆24Updated this week
- OpenMP vs Offload☆21Updated last year
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆99Updated this week
- YAKL is A Kokkos Layer: A simple C++ framework for performance portability and Fortran code porting☆57Updated last week
- General Purpose Timing Library☆32Updated 6 months ago
- HPCG benchmark based on ROCm platform☆35Updated last week
- ROCm SPARSE marshalling library☆69Updated this week
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆22Updated last month
- Bandwidth test for ROCm☆47Updated this week
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago
- KvikIO - High Performance File IO☆156Updated this week
- ☆17Updated 9 months ago
- A proxy app for the Monte Carlo Transport Code, Mercury. LLNL-CODE-684037☆39Updated 9 months ago