ameli / manylinux-cuda
manylinux docker images with CUDA Toolkit
☆10Updated last month
Alternatives and similar repositories for manylinux-cuda:
Users that are interested in manylinux-cuda are comparing it to the libraries listed below
- GitHub Action to install CUDA☆160Updated this week
- Pybind11 tool for making docstrings from C++ comments☆40Updated 9 months ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- A Visual Studio Code Debug Extension for debugging mixed Python and C++ code. The extension starts a Python debug session and attaches th…☆52Updated last year
- POC work on MLIR backend☆50Updated 5 months ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆30Updated last month
- NPBench - A Benchmarking Suite for High-Performance NumPy☆76Updated 2 months ago
- ☆16Updated 2 years ago
- High-Performance Reproducible BLAS using posit arithmetic☆12Updated 2 years ago
- CUDA kernel author's tools☆110Updated 2 years ago
- ☆31Updated this week
- Sympiler is a Code Generator for Transforming Sparse Matrix Codes☆42Updated last year
- The CUDA target for Numba☆43Updated this week
- associative floating point addition☆17Updated 9 months ago
- Generate stubs for python modules☆257Updated 7 months ago
- Header-only C++20 wrapper for MPI 4.0.☆44Updated last year
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆104Updated this week
- xtensor plugin to read and write images, audio files, numpy (compressed) npz and HDF5☆84Updated 9 months ago
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆93Updated 3 years ago
- Instructions and templates for SC authors☆16Updated 3 years ago
- Example of using pytorch's open device registration API☆27Updated 2 years ago
- Data Parallel Extension for NumPy☆101Updated this week
- Unit benchmarks of CUDA event APIs.☆17Updated 9 months ago
- An extension library of WMMA API (Tensor Core API)☆87Updated 6 months ago
- Distributed ranges is a generalization of C++ ranges for distributed data structures.☆48Updated this week
- A library to benchmark CUDA code, similar to google benchmark.☆28Updated 3 years ago
- Reference Implementation for stdBLAS☆131Updated 2 weeks ago
- CuPBoP-AMD is a CUDA translator that translates CUDA programs at NVVM IR level to HIP-compatible IR that can run on AMD GPUs.☆36Updated last year
- Sample projects demonstrating use of scikit-build☆76Updated this week
- pika is a C++ tasking library built on std::execution with fibers, CUDA, HIP, and MPI support.☆68Updated this week