brian-kelley / CUDA-QRLinks

A new QR decomposition algorithm implemented in CUDA

☆17

Alternatives and similar repositories for CUDA-QR

Users that are interested in CUDA-QR are comparing it to the libraries listed below

Sorting:

GPUPeople / spECK
Efficient SpGEMM on GPU using CUDA and CSR
☆57Updated 2 years ago
poojahira / spmv-cuda
Implementation and analysis of five different GPU based SPMV algorithms in CUDA
☆41Updated 6 years ago
ndd314 / cuda_examples
☆68Updated 11 years ago
owensgroup / BGHT
BGHT: High-performance static GPU hash tables.
☆70Updated last month
google-research / sputnik
A library of GPU kernels for sparse matrix operations.
☆270Updated 4 years ago
mark-poscablo / gpu-sum-reduction
CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.
☆37Updated 8 years ago
mattdean1 / cuda
An implementation of parallel exclusive scan in CUDA
☆62Updated 7 years ago
sleeepyjack / warpcore
A Library for fast Hash Tables on GPUs
☆125Updated 3 years ago
wzsh / wmma_tensorcore_sample
Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)
☆138Updated 4 years ago
owensgroup / SlabHash
A warp-oriented dynamic hash table for GPUs
☆74Updated last year
dumerrill / merge-spmv
☆94Updated 8 years ago
horizon-research / rtnn
☆67Updated 2 years ago
weifengliu-ssslab / Benchmark_SpGEMM_using_CSR
CSR-based SpGEMM on nVidia and AMD GPUs
☆46Updated 9 years ago
NVIDIA / nsight-training
Training material for Nsight developer tools
☆163Updated last year
PASSIONLab / MaskedSpGEMM
☆9Updated 3 years ago
wmmae / wmma_extension
An extension library of WMMA API (Tensor Core API)
☆99Updated last year
FZJ-JSC / tutorial-multi-gpu
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
☆287Updated last month
KernelTuner / kernel_tuner
Kernel Tuner
☆357Updated 2 weeks ago
SuperScientificSoftwareLaboratory / TileSpGEMM
Source code of the PPoPP '22 paper: "TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs" by Y…
☆40Updated last year
chenxuhao / caffe-escoin
Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs
☆16Updated 6 years ago
codyjrivera / tsm2x-imp
Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA
☆35Updated 5 years ago
wangzyon / NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
☆363Updated 3 years ago
cwpearson / nvidia-performance-tools
Instructions, Docker images, and examples for Nsight Compute and Nsight Systems
☆131Updated 5 years ago
yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
☆370Updated 7 months ago
mark-poscablo / gpu-prefix-sum
CUDA implementation of exclusive prefix sum via Blelloch's algorithm
☆28Updated 8 years ago
deeperlearning / professional-cuda-c-programming
☆453Updated 10 years ago
daadaada / turingas
Assembler for NVIDIA Volta and Turing GPUs
☆226Updated 3 years ago
RichardAns / CUDA-Programs
Examples from Programming in Parallel with CUDA
☆158Updated 2 years ago
owensgroup / merge-spmm
Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018
☆72Updated 4 years ago
Huanghongru / SGEMM-Implementation-and-Optimization
Some source code about matrix multiplication implementation on CUDA
☆34Updated 6 years ago