eyalroz / cuda-katLinks

CUDA kernel author's tools

☆115

Alternatives and similar repositories for cuda-kat

Users that are interested in cuda-kat are comparing it to the libraries listed below

Sorting:

bryancatanzaro / trove
Full-speed Array of Structures access
☆176Updated 2 years ago
harrism / ranger
Generate simple index ranges in C++ and CUDA C++
☆39Updated 2 years ago
ashvardanian / ParallelReductionsBenchmark
Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal, and Rust - all it takes to sum a lot of numbers fast!
☆114Updated 5 months ago
codeplaysoftware / portBLAS
Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.
☆260Updated 11 months ago
NERSC / timemory
Modular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template…
☆366Updated last year
NVIDIA / jitify
A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
☆567Updated 3 months ago
jeffhammond / dpcpp-tutorial
Intel Data Parallel C++ (and SYCL 2020) Tutorial.
☆95Updated 4 years ago
codeplaysoftware / portDNN
portDNN is a library implementing neural network algorithms written using SYCL
☆113Updated last year
alpaka-group / alpaka
Abstraction Library for Parallel Kernel Acceleration
☆399Updated last week
harrism / hemi
Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.
☆348Updated 3 years ago
gevtushenko / cuda_benchmark
A library to benchmark CUDA code, similar to google benchmark.
☆30Updated 4 years ago
ROCm / HIP-CPU
An implementation of HIP that works on CPUs, across OSes.
☆131Updated last year
Kobzol / hardware-effects-gpu
Demonstration of various hardware effects on CUDA GPUs.
☆389Updated 2 years ago
brycelelbach / cub_historical_2019_2020
Cooperative Primitives for CUDA C++ Kernel Authors. This repository contains CUB PRs from Q4 2019 until Q4 2020.
☆22Updated 5 years ago
milakov / int_fastdiv
Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.
☆73Updated 10 years ago
crosetto / cupq
a CUDA implementation of a priority queue
☆84Updated 5 years ago
celerity / celerity-runtime
High-level C++ for Accelerator Clusters
☆154Updated last month
agenium-scale / nsimd
Agenium Scale vectorization library for CPUs and GPUs
☆337Updated 4 years ago
ogiroux / freestanding
☆71Updated 5 years ago
harrism / cpp11-range
Range-based for loops to iterate over a range of numbers or values
☆34Updated 9 years ago
ROCm / rocThrust
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆124Updated this week
codeplaysoftware / SYCL-For-CUDA-Examples
Examples for using SYCL on CUDA
☆62Updated 3 months ago
eyalroz / cuda-api-wrappers
Thin, unified, C++-flavored wrappers for the CUDA APIs
☆865Updated last month
KhronosGroup / SYCL-Docs
SYCL Open Source Specification
☆141Updated this week
intel / opencl-intercept-layer
Intercept Layer for Debugging and Analyzing OpenCL Applications
☆346Updated last week
eyalroz / gpu-kernel-runner
Runs a single CUDA/OpenCL kernel, taking its source from a file and arguments from the command-line
☆24Updated last month
unisa-hpc / sycl-bench
SYCL Benchmark Suite
☆66Updated 6 months ago
HiPerCoRe / KTT
Kernel Tuning Toolkit
☆64Updated last month
Apress / data-parallel-CPP
Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben A…
☆281Updated 8 months ago
ROCm / omnitrace
Omnitrace: Application Profiling, Tracing, and Analysis
☆338Updated 2 weeks ago