ardenma/implicit-gemm-tensor-core-convolution

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ardenma/implicit-gemm-tensor-core-convolution)

ardenma / implicit-gemm-tensor-core-convolution

Simple example of how to write an Implicit GEMM Convolution in CUDA using the tensor core WMMA API and bindings for PyTorch.

☆18

Alternatives and similar repositories for implicit-gemm-tensor-core-convolution

Users that are interested in implicit-gemm-tensor-core-convolution are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lixiuhong / implicit_gemm_convolution
View on GitHub
☆14May 28, 2019Updated 6 years ago
comfysage / saku
View on GitHub
a tiny distro-independent package manager written in Rust.
☆16Jun 22, 2024Updated last year
apuaaChen / vectorSparse
View on GitHub
☆32Aug 24, 2022Updated 3 years ago
YukeWang96 / QGTC_PPoPP22
View on GitHub
Artifact for PPoPP22 QGTC: Accelerating Quantized GNN via GPU Tensor Core.
☆30Feb 12, 2022Updated 4 years ago
tgautam03 / tGeMM
View on GitHub
General Matrix Multiplication using NVIDIA Tensor Cores
☆28Jan 25, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
pulp-platform / cluster_interconnect
View on GitHub
☆17Apr 23, 2026Updated 3 weeks ago
HL-hanlin / GKAT
View on GitHub
☆11Apr 16, 2023Updated 3 years ago
ubugeeei / ubcc
View on GitHub
A toy C Compiler implemented by Rust.
☆19Feb 4, 2023Updated 3 years ago
EnigmaHuang / Saad_Book_ForTran
View on GitHub
Some "Formula Translations" for Yousef Saad's book "Iterative Methods for Sparse Linear Systems (2nd Edition)"
☆13Jan 14, 2018Updated 8 years ago
sfilippone / mld2p4-2
View on GitHub
☆14Jul 16, 2020Updated 5 years ago
chemeng / GPGPU-GMRES-Method
View on GitHub
CUDA GPU implementation of GMRES iterative Solver
☆10Apr 16, 2012Updated 14 years ago
wmmae / wmma_extension
View on GitHub
An extension library of WMMA API (Tensor Core API)
☆113Jul 12, 2024Updated last year
olcf / NVIDIA-tensor-core-examples
View on GitHub
☆20Nov 7, 2019Updated 6 years ago
Cr0a3 / rllvm
View on GitHub
LLVM alternative in Rust
☆15May 20, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
aniketp / ipcp-dsp
View on GitHub
Instruction Pointer Classifier and Dynamic Degree Stream based Hardware Cache Prefetching
☆16Nov 16, 2019Updated 6 years ago
SuperScientificSoftwareLaboratory / TileSpGEMM
View on GitHub
Source code of the PPoPP '22 paper: "TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs" by Y…
☆47May 22, 2024Updated last year
gevtushenko / block_matrix_format_performance
View on GitHub
☆12Jan 19, 2020Updated 6 years ago
gouarin / GenEO
View on GitHub
☆12Jan 13, 2023Updated 3 years ago
kurenaif / auto_wmake
View on GitHub
OpenFOAM right wmake at the right time
☆11Mar 10, 2019Updated 7 years ago
ece-fast-lab / ISCA-2025-LIA
View on GitHub
[ISCA'25] LIA: A Single-GPU LLM Inference Acceleration with Cooperative AMX-Enabled CPU-GPU Computation and CXL Offloading
☆24Jan 6, 2026Updated 4 months ago
revilokeb / vgg16_batchnorm
View on GitHub
VGG16 architecture with BatchNorm
☆14Apr 4, 2017Updated 9 years ago
temporal-hpc / reduction-tensor-cores
View on GitHub
Fast GPU based tensor core reductions
☆13Jan 13, 2023Updated 3 years ago
tgautam03 / CUDA-C
View on GitHub
Simple problems implemented in CUDA C
☆35Apr 7, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
zkat / supports-color
View on GitHub
Detects whether a terminal supports color, and gives details about that support
☆53Nov 26, 2024Updated last year
lorentzbf / OpSparse
View on GitHub
Source code of the paper "OpSparse: a Highly Optimized Framework for Sparse General Matrix Multiplication on GPUs"
☆16Aug 23, 2022Updated 3 years ago
YusukeNagasaka / Batched-SpMM
View on GitHub
New batched algorithm for sparse matrix-matrix multiplication (SpMM)
☆16May 7, 2019Updated 7 years ago
arirepo / paraGMRES
View on GitHub
Massively Scalable Parallel GMRES C-code for Sparse System of Equations
☆13Feb 16, 2016Updated 10 years ago
lionleaf / parallel-c-programs
View on GitHub
A wide array of parallel programs using CUDA, OpenCL, MPI, OpenMP and pthreads.
☆14Jan 6, 2015Updated 11 years ago
columbia / neongoby
View on GitHub
NeonGoby alias analysis checker
☆15Jul 2, 2013Updated 12 years ago
heiligerl / AutoPET_Challenge_Submission
View on GitHub
☆11Sep 4, 2022Updated 3 years ago
vtsynergy / bb_segsort
View on GitHub
☆21Aug 21, 2023Updated 2 years ago
brian-kelley / CUDA-QR
View on GitHub
A new QR decomposition algorithm implemented in CUDA
☆18Jun 24, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
jsyk / PIP-Watch
View on GitHub
☆11Dec 13, 2014Updated 11 years ago
dgSPARSE / dgNN
View on GitHub
[Mlsys'22] Understanding gnn computational graph: A coordinated computation, io, and memory perspective
☆22Sep 11, 2023Updated 2 years ago
KnowingNothing / MatmulTutorial
View on GitHub
A Easy-to-understand TensorOp Matmul Tutorial
☆434Mar 5, 2026Updated 2 months ago
negativo17 / cuda
View on GitHub
NVIDIA Compute Unified Device Architecture Toolkit
☆15Mar 19, 2026Updated 2 months ago
hgyhungry / ShflBW_Sparse_NN
View on GitHub
☆16Nov 22, 2022Updated 3 years ago
ParCIS / Magicube
View on GitHub
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
☆92Nov 23, 2022Updated 3 years ago
littlebee / scatbot-edge-ai-shootout
View on GitHub
A place to chronical performance of various hardward and software solutions for using AI at the edge for object detection
☆10Sep 11, 2022Updated 3 years ago