piojanu / CUDA-im2col-convLinks

CUDA project for uni subject

☆26

Alternatives and similar repositories for CUDA-im2col-conv

Users that are interested in CUDA-im2col-conv are comparing it to the libraries listed below

Sorting:

UDC-GAC / openCNN
A Winograd Minimal Filter Implementation in CUDA
☆28Updated 4 years ago
lixiuhong / batched_gemm
☆39Updated 5 years ago
UDC-GAC / venom
A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
☆54Updated last year
LeiWang1999 / tvm_gpu_gemm
play gemm with tvm
☆92Updated 2 years ago
apuaaChen / EVT_AE
Artifacts of EVT ASPLOS'24
☆28Updated last year
sunlex0717 / DissectingTensorCores
☆109Updated last year
uwsampl / SparseTIR
SparseTIR: Sparse Tensor Compiler for Deep Learning
☆141Updated 2 years ago
lenLRX / AmpereSparseMatmul
study of Ampere' Sparse Matmul
☆18Updated 4 years ago
sjfeng1999 / gpu-arch-microbenchmark
Dissecting NVIDIA GPU Architecture
☆109Updated 3 years ago
lixiuhong / implicit_gemm_convolution
☆14Updated 6 years ago
BoyuanFeng / APNN-TC
☆19Updated 4 years ago
mit-han-lab / inter-operator-scheduler
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
☆200Updated 3 years ago
Qualcomm-AI-research / FP8-quantization
☆163Updated 2 years ago
zeroine / cutlass-cute-sample
☆47Updated last year
humuyan / Korch
ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch
☆38Updated 7 months ago
marsupialtail / gpu-sparsert
☆18Updated 5 years ago
xxyux / SpInfer
SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
☆59Updated 7 months ago
ParCIS / Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
☆89Updated 2 years ago
pku-liang / TileFlow
TileFlow is a performance analysis tool based on Timeloop for fusion dataflows
☆62Updated last year
njuhope / cuda_sgemm
☆116Updated last year
pku-liang / AMOS
Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators
☆116Updated 3 years ago
microsoft / SparTA
☆159Updated last year
uiuc-arc / felix
Optimize tensor program fast with Felix, a gradient descent autotuner.
☆28Updated last year
Bruce-Lee-LY / cuda_hgemv
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
☆68Updated last year
nicolaswilde / cuda-tensorcore-hgemm
☆156Updated 10 months ago
wzsh / wmma_tensorcore_sample
Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)
☆145Updated 5 years ago
UofT-EcoSystem / DietCode
DietCode Code Release
☆65Updated 3 years ago
uwsampl / sparsetir-artifact
Repository for artifact evaluation of ASPLOS 2023 paper "SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning"
☆26Updated 2 years ago
FdyCN / PTX-ISA
CUDA PTX-ISA Document 中文翻译版
☆47Updated last month
IntelLabs / FP8-Emulation-Toolkit
PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.
☆111Updated 11 months ago