YashasSamaga/ConvolutionBuildingBlocks

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/YashasSamaga/ConvolutionBuildingBlocks)

YashasSamaga / ConvolutionBuildingBlocks

GEMM and Winograd based convolutions using CUTLASS

☆28

Alternatives and similar repositories for ConvolutionBuildingBlocks

Users that are interested in ConvolutionBuildingBlocks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

md2z34 / winograd_gpu
View on GitHub
GPU implementation of Winograd convolution
☆10Oct 23, 2017Updated 8 years ago
LeiWang1999 / Stream-k.tvm
View on GitHub
☆20Sep 28, 2024Updated last year
TiledTensor / TiledKernel
View on GitHub
TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.
☆19May 12, 2024Updated 2 years ago
jakobhartmann / tensor-eqs-mcts
View on GitHub
Optimizing Tensor Computation Graphs with Equality Saturation and Monte Carlo Tree Search
☆15Aug 9, 2024Updated last year
latentCall145 / channels-last-groupnorm
View on GitHub
A CUDA kernel for NHWC GroupNorm for PyTorch
☆23Nov 15, 2024Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
c3sr / tcu_scope
View on GitHub
☆50Jun 27, 2019Updated 7 years ago
jdnie / Winograd_study
View on GitHub
理解winograd算法原理
☆10Apr 26, 2020Updated 6 years ago
rchardx / hopper-gemm
View on GitHub
☆48Nov 1, 2025Updated 8 months ago
MegEngine / cutlass-bak
View on GitHub
modified cutlass
☆16Oct 26, 2020Updated 5 years ago
csl-iisc / MGVM-MICRO2022
View on GitHub
☆12Oct 25, 2022Updated 3 years ago
marian-nmt / amun
View on GitHub
Fast stand-alone C++ decoder for RNN-based NMT models
☆31Dec 12, 2020Updated 5 years ago
Tianshi-Xu / PrivCirNet
View on GitHub
[NeurIPS'24] Official implement of "PrivCirNet: Efficient Private Inference via Block Circulant Transformation"
☆14Feb 26, 2026Updated 4 months ago
merrymercy / Awesome-Efficient-LLM
View on GitHub
A curated list for Efficient Large Language Models
☆11Mar 25, 2024Updated 2 years ago
YusukeNagasaka / Batched-SpMM
View on GitHub
New batched algorithm for sparse matrix-matrix multiplication (SpMM)
☆16May 7, 2019Updated 7 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
microsoft / FractalTensor
View on GitHub
FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …
☆32Dec 21, 2024Updated last year
ZipCPU / qoiimg
View on GitHub
Quite OK image compression Verilog implementation
☆23Nov 27, 2024Updated last year
PKU-SEC-Lab / mpcvit
View on GitHub
Code release for MPCViT accepted by ICCV 2023
☆16Jan 6, 2025Updated last year
SparseLinearAlgebra / spbla
View on GitHub
Sparse Boolean linear algebra for Nvidia Cuda, OpenCL and CPU computations
☆16Aug 19, 2022Updated 3 years ago
jwuphysics / HI-convnets
View on GitHub
Estimating galaxy gas mass fractions using SDSS imaging
☆13Nov 30, 2020Updated 5 years ago
ranran0523 / SPECNN
View on GitHub
code repo for paper accepted in ICML 2023
☆13Oct 19, 2023Updated 2 years ago
zhxchd / Blink_GNN
View on GitHub
Code for CCS '23 paper "Blink: Link Local Differential Privacy in Graph Neural Networks via Bayesian Estimation"
☆16Nov 17, 2023Updated 2 years ago
KTTRCDL / graph-feature-selection
View on GitHub
[ICLR 2025] Let Your Features Tell The Differences: Understanding Graph Convolution By Feature Splitting
☆16Nov 24, 2025Updated 7 months ago
zjzijielu / gnn-positional-structural-node-features
View on GitHub
Official repository of "On Positional and Structural Node Features for Graph Neural Networks on Non-attributed Graphs", CIKM 2022
☆19Aug 23, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
VivekPanyam / cudaparsers
View on GitHub
Parsers for CUDA binary files
☆25Dec 29, 2023Updated 2 years ago
owensgroup / merge-spmm
View on GitHub
Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018
☆73Oct 5, 2020Updated 5 years ago
GindaChen / FlexFlashAttention3
View on GitHub
FlexAttention w/ FlashAttention3 Support
☆27Oct 5, 2024Updated last year
thu-pacman / PET
View on GitHub
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections
☆126Jun 23, 2022Updated 4 years ago
microsoft / TileFusion
View on GitHub
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆115Jun 28, 2025Updated last year
sjtu-epcc / Tacker
View on GitHub
Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS
☆33Feb 10, 2025Updated last year
haguettaz / ChebLieNet
View on GitHub
ChebLieNet, a spectral graph neural network turned equivariant by Riemannian geometry on Lie groups.
☆14Aug 20, 2024Updated last year
Meinersbur / pet
View on GitHub
Polyhedral Extraction Tool (source repository: http://repo.or.cz/w/pet.git)
☆42Jul 22, 2022Updated 3 years ago
xnd-project / cuda-benchmarks
View on GitHub
Collection of CUDA benchmarks, with a focus on unified vs. explicit memory management.
☆21Oct 15, 2019Updated 6 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
IBM / cmnnc
View on GitHub
Computational Memory Neural Network Compiler
☆11Aug 11, 2021Updated 4 years ago
jucaleb4 / Bilinear-Algorithms-for-Convolution
View on GitHub
A Python module for generating fast bilinear algorithms for different convolution algorithms
☆16Feb 29, 2024Updated 2 years ago
UofT-EcoSystem / DietCode
View on GitHub
DietCode Code Release
☆65Jul 21, 2022Updated 3 years ago
gussmith23 / glenside
View on GitHub
A pure, low-level tensor program representation enabling tensor program optimization via program rewriting. See the web demo at https://g…
☆76May 30, 2025Updated last year
UDC-GAC / openCNN
View on GitHub
A Winograd Minimal Filter Implementation in CUDA
☆31Aug 25, 2021Updated 4 years ago
microsoft / cusync
View on GitHub
☆27Feb 20, 2024Updated 2 years ago
xuqiantong / CUDA-Winograd
View on GitHub
Fast CUDA Kernels for ResNet Inference.
☆183May 26, 2019Updated 7 years ago