curtisseizert / CUDA-uint128Links
A 128 bit unsigned integer class for CUDA
☆46Updated 5 months ago
Alternatives and similar repositories for CUDA-uint128
Users that are interested in CUDA-uint128 are comparing it to the libraries listed below
Sorting:
- CGBN: CUDA Accelerated Multiple Precision Arithmetic (Big Num) using Cooperative Groups☆214Updated 3 months ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆81Updated 5 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆105Updated 7 years ago
- Kernel Tuning Toolkit☆59Updated 3 weeks ago
- NUMA-aware multi-CPU multi-GPU data transfer benchmarks☆23Updated last year
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆52Updated 2 months ago
- ☆57Updated last week
- CUDA accelerated(X) Multi-Precision library☆90Updated 8 years ago
- The CUDA Multiple Precision Arithmetic Library☆46Updated 12 years ago
- SYCL Reference Manual☆28Updated last year
- Distributed ranges is a generalization of C++ ranges for distributed data structures.☆51Updated 3 weeks ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- SYCL Benchmark Suite☆64Updated 3 months ago
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆120Updated this week
- Reusable software components for ROCm developers☆84Updated this week
- Next generation LAPACK implementation for ROCm platform☆101Updated this week
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 9 years ago
- assembler for NVIDIA FERMI. Imported from Google Code☆72Updated 10 years ago
- RAND library for HIP programming language☆120Updated last week
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆93Updated 3 years ago
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆119Updated 2 years ago
- CUDA kernel author's tools☆111Updated 3 years ago
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆89Updated last year
- portDNN is a library implementing neural network algorithms written using SYCL☆113Updated last year
- Advanced Profiling and Analytics for AMD Hardware☆156Updated this week
- C++ convenience classes to be used with CUDA code, for both the host and the kerlel parts.☆55Updated 6 years ago
- ☆23Updated 3 years ago
- ROCm Parallel Primitives☆172Updated this week
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- A library to benchmark CUDA code, similar to google benchmark.☆28Updated 4 years ago