NVIDIA / numbast
Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.
☆44Updated this week
Alternatives and similar repositories for numbast:
Users that are interested in numbast are comparing it to the libraries listed below
- The CUDA target for Numba☆112Updated this week
- Data Parallel Extension for Numba☆81Updated 5 months ago
- Analyze graph/hierarchical performance data using pandas dataframes☆114Updated 3 months ago
- Copy-hiding array abstraction to automatically migrate data between memory spaces☆107Updated this week
- SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) sy…☆114Updated 4 months ago
- DLA-Future☆72Updated this week
- Compiler agnostic metaprogramming library providing concepts, type operations and tuples for C++ and cuda☆87Updated this week
- Python bindings for OpenSHMEM☆16Updated 2 weeks ago
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆54Updated 2 weeks ago
- AMD’s C++ library for accelerating tensor primitives☆39Updated this week
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- Kokkos C++ Performance Portability Programming Ecosystem: Profiling and Debugging Tools☆122Updated last week
- Reusable software components for ROCm developers☆83Updated this week
- A unified framework across multiple programming platforms☆37Updated 10 months ago
- Data Parallel Extension for NumPy☆108Updated this week
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆108Updated this week
- Python SYCL bindings and SYCL-based Python Array API library☆110Updated this week
- Distributed View Extension for Kokkos☆45Updated 5 months ago
- ROCm SPARSE marshalling library☆67Updated this week
- YAKL is A Kokkos Layer: A simple C++ framework for performance portability and Fortran code porting☆66Updated 2 weeks ago
- The Combinatorial BLAS (CombBLAS) is an extensible distributed-memory parallel graph library offering a small but powerful set of linear …☆72Updated last month
- Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceler…☆29Updated 10 months ago
- GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.☆36Updated last month
- Next generation library for iterative sparse solvers for ROCm platform☆81Updated last week
- Exploring using stdpar and Cython☆33Updated 4 years ago
- Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support☆53Updated 2 months ago
- NPBench - A Benchmarking Suite for High-Performance NumPy☆81Updated 2 weeks ago
- Next generation LAPACK implementation for ROCm platform☆100Updated this week
- ☆36Updated 5 months ago
- Advanced Profiling and Analytics for AMD Hardware☆154Updated this week