minhhn2910 / cuda-half2Links

Convert CUDA programs from float data type to half or half2 with SIMDization

☆20

Alternatives and similar repositories for cuda-half2

Users that are interested in cuda-half2 are comparing it to the libraries listed below

Sorting:

ekondis / gpumembench
A GPU benchmark suite for assessing on-chip GPU memory bandwidth
☆105Updated 7 years ago
gevtushenko / cuda_benchmark
A library to benchmark CUDA code, similar to google benchmark.
☆29Updated 4 years ago
codeplaysoftware / portDNN
portDNN is a library implementing neural network algorithms written using SYCL
☆113Updated last year
bryancatanzaro / trove
Full-speed Array of Structures access
☆171Updated 2 years ago
lukeyeager / cmake-cuda-example
Example of how to use CUDA with CMake >= 3.8
☆70Updated 2 weeks ago
ap-hynninen / cutt
CUDA Tensor Transpose (cuTT) library
☆52Updated 7 years ago
chai-benchmarks / chai
Chai
☆44Updated last year
dmlc / nnvm-fusion
Kernel Fusion and Runtime Compilation Based on NNVM
☆70Updated 8 years ago
PAA-NCIC / PPoPP2017_artifact
Third party assembler and GEMM library for NVIDIA Kepler GPU
☆81Updated 5 years ago
csehydrogen / Winograd-OpenCL
Winograd-based convolution implementation in OpenCL
☆28Updated 8 years ago
shoaibkamil / stencilprobe
Stencil Probe - a stencil microbenchmark
☆30Updated 12 years ago
maps-gpu / MAPS
GPU Optimization and Memory Abstraction Framework
☆32Updated 5 years ago
codyjrivera / tsm2x-imp
Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA
☆32Updated 4 years ago
daadaada / gas
☆44Updated 4 years ago
apc-llc / nvcc-llvm-ir
Enabling on-the-fly manipulations with LLVM IR code of CUDA sources
☆111Updated 2 months ago
crosetto / cupq
a CUDA implementation of a priority queue
☆84Updated 4 years ago
inducer / islpy
Python wrapper for isl, an integer set library
☆77Updated last week
haanjack / mnist-cudnn
CUDA for MNIST training/inference
☆41Updated last year
tbennun / mgbench
Multi-GPU Computing Benchmark Suite (CUDA)
☆42Updated 8 years ago
eyalroz / cuda-kat
CUDA kernel author's tools
☆111Updated 3 years ago
tue-es / gpu-cache-model
A GPU cache model for research purposes
☆28Updated 11 years ago
owensgroup / BGHT
BGHT: High-performance static GPU hash tables.
☆66Updated 2 months ago
kajalv / nvml-power
Power measurement for CUDA programs by polling using NVIDIA Management Library (nvml) APIs.
☆24Updated 8 years ago
andersy005 / tvm-in-action
TVM stack: exploring the incredible explosion of deep-learning frameworks and how to bring them together
☆64Updated 7 years ago
bwasti / pytorch_compiler_tutorial
Codebase associated with the PyTorch compiler tutorial
☆46Updated 5 years ago
CNugteren / CLTune
CLTune: An automatic OpenCL & CUDA kernel tuner
☆179Updated 2 years ago
mmperf / mmperf
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
☆133Updated last year
Meinersbur / ppcg
Polyhedral Parallel Code Generation (source repository: http://repo.or.cz/ppcg.git)
☆126Updated 2 years ago
sderek / CUDAAdvisor
CUDAAdvisor: a GPU profiling tool
☆49Updated 6 years ago
ROCm / rocPRIM
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆172Updated this week