minhhn2910 / cuda-half2
Convert CUDA programs from float data type to half or half2 with SIMDization
☆20Updated 5 years ago
Related projects ⓘ
Alternatives and complementary repositories for cuda-half2
- A GPU cache model for research purposes☆26Updated 11 years ago
- Full-speed Array of Structures access☆161Updated last year
- A framework that helps implementing swizzle GPU kernels☆41Updated 4 years ago
- Enabling on-the-fly manipulations with LLVM IR code of CUDA sources☆101Updated last year
- ☆40Updated 3 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆99Updated 7 years ago
- Chai☆42Updated 11 months ago
- CUDAAdvisor: a GPU profiling tool☆48Updated 6 years ago
- Polyhedral Parallel Code Generation (source repository: http://repo.or.cz/ppcg.git)☆117Updated 2 years ago
- ☆47Updated 5 years ago
- ☆20Updated 2 years ago
- LonestarGPU: Irregular algorithms parallelized for GPUs☆33Updated 5 years ago
- portDNN is a library implementing neural network algorithms written using SYCL☆108Updated 6 months ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆78Updated 5 years ago
- ☆37Updated 3 years ago
- GPU Optimization and Memory Abstraction Framework☆32Updated 5 years ago
- A library to benchmark CUDA code, similar to google benchmark.☆28Updated 3 years ago
- Python wrapper for isl, an integer set library☆73Updated last week
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆83Updated 9 months ago
- A portable high-level API with CUDA or OpenCL back-end☆54Updated 7 years ago
- A Benchmark Suite for Heterogeneous System Computation☆52Updated 3 weeks ago
- MLIRX is now defunct. Please see PolyBlocks - https://docs.polymagelabs.com☆38Updated 11 months ago
- A domain-specific language and compiler for image processing☆76Updated 3 years ago
- ☆50Updated 5 years ago
- Kernel Tuning Toolkit☆55Updated 3 weeks ago
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 4 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆43Updated 10 months ago
- Power measurement for CUDA programs by polling using NVIDIA Management Library (nvml) APIs.☆23Updated 7 years ago
- An experimental ahead of time compiler for Relay.☆51Updated 4 years ago
- Polyhedral Extraction Tool (source repository: http://repo.or.cz/w/pet.git)☆38Updated 2 years ago