curtisseizert / CUDA-uint128
A 128 bit unsigned integer class for CUDA
☆43Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for CUDA-uint128
- CGBN: CUDA Accelerated Multiple Precision Arithmetic (Big Num) using Cooperative Groups☆206Updated last month
- CUDA kernel author's tools☆109Updated 2 years ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆107Updated last year
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 9 years ago
- CUDA accelerated(X) Multi-Precision library☆87Updated 8 years ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆78Updated 5 years ago
- A library to benchmark CUDA code, similar to google benchmark.☆28Updated 3 years ago
- Thrust, CUB, TBB, AVX2, CUDA, OpenCL, OpenMP, SyCL - all it takes to sum a lot of numbers fast!☆73Updated 6 months ago
- Full-speed Array of Structures access☆162Updated last year
- portDNN is a library implementing neural network algorithms written using SYCL☆108Updated 6 months ago
- ☆16Updated 3 years ago
- Kernel Tuning Toolkit☆55Updated 3 weeks ago
- SYCL Conformance Tests☆62Updated last week
- An implementation of BLAS using the SYCL open standard.☆259Updated 3 weeks ago
- ☆50Updated 5 years ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆83Updated 9 months ago
- Emulating DMA Engines on GPUs for Performance and Portability☆34Updated 9 years ago
- Short examples illustrating AVX2 intrinsics for simple tasks.☆83Updated 8 months ago
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆100Updated this week
- Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019☆52Updated 2 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆99Updated 7 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆43Updated 10 months ago
- Simple OpenCL Samples that Build with Khronos Headers and Libs☆88Updated last week
- The CUDA Multiple Precision Arithmetic Library☆44Updated 12 years ago
- Power measurement for CUDA programs by polling using NVIDIA Management Library (nvml) APIs.☆23Updated 7 years ago
- Example code for Intel AVX / AVX2 intrinsics.☆128Updated last year
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆124Updated last year
- Enabling on-the-fly manipulations with LLVM IR code of CUDA sources☆102Updated last year
- Flexible GPGPU instrumentation☆86Updated 5 years ago