NVIDIA / numbast
Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.
☆44Updated this week
Alternatives and similar repositories for numbast:
Users that are interested in numbast are comparing it to the libraries listed below
- The CUDA target for Numba☆102Updated this week
- SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) sy…☆112Updated 3 months ago
- NPBench - A Benchmarking Suite for High-Performance NumPy☆80Updated 2 weeks ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- AMD’s C++ library for accelerating tensor primitives☆39Updated this week
- Data Parallel Extension for Numba☆80Updated 5 months ago
- Kokkos C++ Performance Portability Programming Ecosystem: Profiling and Debugging Tools☆121Updated 3 months ago
- Compiler agnostic metaprogramming library providing concepts, type operations and tuples for C++ and cuda☆86Updated this week
- DLA-Future☆71Updated this week
- Exploring using stdpar and Cython☆33Updated 4 years ago
- Distributed View Extension for Kokkos☆45Updated 4 months ago
- Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support☆53Updated last month
- A hands-on introduction to tuning GPU kernels using Kernel Tuner https://github.com/KernelTuner/kernel_tuner/☆30Updated last week
- Reusable software components for ROCm developers☆83Updated this week
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆51Updated last month
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆107Updated this week
- Next generation library for iterative sparse solvers for ROCm platform☆79Updated this week
- ☆46Updated this week
- Benchmark of expression templates libraries☆41Updated 4 years ago
- YAKL is A Kokkos Layer: A simple C++ framework for performance portability and Fortran code porting☆65Updated 3 weeks ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last month
- A unified framework across multiple programming platforms☆36Updated 10 months ago
- GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.☆36Updated last week
- Fast and full-featured Matrix Market I/O library for C++, Python, and R☆78Updated 8 months ago
- Next generation LAPACK implementation for ROCm platform☆99Updated this week
- Analyze graph/hierarchical performance data using pandas dataframes☆113Updated 2 months ago
- The Foundation for All Legate Libraries☆213Updated this week
- Data Parallel Extension for NumPy☆105Updated this week
- The Combinatorial BLAS (CombBLAS) is an extensible distributed-memory parallel graph library offering a small but powerful set of linear …☆72Updated 3 weeks ago
- C++ HPC Tutorial materials☆49Updated 9 months ago