enp1s0 / cutf
CUDA Template Functions
☆19Updated 4 months ago
Alternatives and similar repositories for cutf:
Users that are interested in cutf are comparing it to the libraries listed below
- Examples for using SYCL on CUDA☆62Updated 2 months ago
- AMD’s C++ library for accelerating tensor primitives☆39Updated this week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last month
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆60Updated last month
- An extension library of WMMA API (Tensor Core API)☆96Updated 9 months ago
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆108Updated this week
- ☆67Updated 11 years ago
- Random number library that generate pseudo-random and quasi-random numbers.☆26Updated this week
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆37Updated 7 years ago
- Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as ta…☆47Updated 3 years ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- CMake modules used within the ROCm libraries☆66Updated last week
- Reusable software components for ROCm developers☆83Updated this week
- Runs a single CUDA/OpenCL kernel, taking its source from a file and arguments from the command-line☆23Updated 2 weeks ago
- SYCL Reference Manual☆27Updated last year
- Accelerating DNN Convolutional Layers with Micro-batches☆63Updated 5 years ago
- cuASR: CUDA Algebra for Semirings☆35Updated 2 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- hipFFT is a FFT marshalling library.☆63Updated last week
- Multiple-precision GPU accelerated linear algebra routines (dense and sparse) based on residue number system☆17Updated 2 years ago
- A library to benchmark CUDA code, similar to google benchmark.☆28Updated 4 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆68Updated 2 years ago
- ☆36Updated this week
- A thin wrapper around miOpen and cuDNN☆42Updated last year
- ☆23Updated 3 years ago
- portDNN is a library implementing neural network algorithms written using SYCL☆113Updated 11 months ago
- rocWMMA☆110Updated this week
- Bandwidth test for ROCm☆54Updated 3 weeks ago
- Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research☆109Updated last year
- Tools and extensions for CUDA profiling☆65Updated 5 years ago