Bruce-Lee-LY/cuda_back2back_hgemm

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Bruce-Lee-LY/cuda_back2back_hgemm)

Bruce-Lee-LY / cuda_back2back_hgemm

Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.

☆13

Alternatives and similar repositories for cuda_back2back_hgemm

Users that are interested in cuda_back2back_hgemm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Bruce-Lee-LY / cutlass_gemm
View on GitHub
Multiple GEMM operators are constructed with cutlass to support LLM inference.
☆20Aug 3, 2025Updated 11 months ago
eth-cscs / spla
View on GitHub
Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceler…
☆32Jun 26, 2024Updated 2 years ago
krabicezpapundeklu / lemon-parser
View on GitHub
Lemon is an LALR(1) parser generator for C or C++.
☆17Jun 10, 2014Updated 12 years ago
SuperScientificSoftwareLaboratory / TileSpMV
View on GitHub
Source code of the IPDPS '21 paper: "TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs" by Yuyao Niu, Zhengyang…
☆13Aug 12, 2022Updated 3 years ago
AnonymousRepo123 / AlphaSparse
View on GitHub
A intelligent matrix format designer for SpMV
☆10Oct 10, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
mumu12641 / strawberry
View on GitHub
🍓 A toy object-oriented programming language written by rust
☆17Apr 10, 2024Updated 2 years ago
matheusgomes28 / 65k-cpp
View on GitHub
6502 Emulator written in C++
☆13Feb 18, 2025Updated last year
XuanYang-cn / pyetcd
View on GitHub
Python client for the etcd API v3, supported python >= 3.7, under active maintenance
☆13Aug 4, 2025Updated 11 months ago
LucasWilkinson / ASpT-mirror
View on GitHub
Mirror of http://gitlab.hpcrl.cse.ohio-state.edu/chong/ppopp19_ae, refactoring for understanding
☆17Oct 20, 2021Updated 4 years ago
Ivanrs297 / cuda-spmv-csr
View on GitHub
Parallel SpMV using CSR representation, built in CUDA
☆14Jun 27, 2020Updated 6 years ago
Agzs / hyperledger
View on GitHub
blockchain open sources
☆11Aug 18, 2017Updated 8 years ago
UDC-GAC / venom
View on GitHub
A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
☆62Nov 24, 2023Updated 2 years ago
lijiaocn / kube-tools
View on GitHub
kubernetes调试检测工具
☆13Nov 8, 2018Updated 7 years ago
slongle / GPU-Renderer
View on GitHub
Offline renderer using CUDA
☆13Jun 8, 2020Updated 6 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Bruce-Lee-LY / cuda_hgemm
View on GitHub
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…
☆558Sep 8, 2024Updated last year
Ipsedo / MusicGAN
View on GitHub
Music GAN - GANSynth preprocessing, ProGAN and DCGAN architecture
☆11Jan 26, 2023Updated 3 years ago
MingliSun / MLIR-TVM
View on GitHub
☆13Nov 25, 2019Updated 6 years ago
josehu07 / cuckoo-hashing-CUDA
View on GitHub
Parallel cuckoo hashing on GPUs with CUDA
☆12Sep 27, 2019Updated 6 years ago
nullplay / Unified-Convolution-Framework
View on GitHub
☆10Apr 24, 2023Updated 3 years ago
minhhn2910 / cuda-half2
View on GitHub
Convert CUDA programs from float data type to half or half2 with SIMDization
☆19May 28, 2019Updated 7 years ago
ModelTC / pyvlova
View on GitHub
Yet another Polyhedra Compiler for DeepLearning
☆19Apr 14, 2023Updated 3 years ago
vancemiller / CUDA-preemption
View on GitHub
Experiments evaluating preemption on the NVIDIA Pascal architecture
☆16Nov 10, 2016Updated 9 years ago
hummingtree / cuda-graph-with-dynamic-parameters
View on GitHub
☆17Aug 9, 2022Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
j-levy / bwa-gasal2
View on GitHub
BWA-MEM program accelerated with the GASAL2 library
☆19Sep 2, 2019Updated 6 years ago
sinshu / odinysynth
View on GitHub
A SoundFont MIDI synthesizer written in pure Odinlang
☆11Aug 13, 2023Updated 2 years ago
Bruce-Lee-LY / matrix_multiply
View on GitHub
Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
☆14Feb 8, 2023Updated 3 years ago
Ratbuyer / h100-features
View on GitHub
☆18Mar 12, 2025Updated last year
10aded / Odin-Vulkan-GLFW-Rainbow-Triangle
View on GitHub
A "minimal" example of a Vulkan rainbow triangle in Odin with GLFW.
☆13Jun 2, 2024Updated 2 years ago
rj45 / gosie_c
View on GitHub
A Data Oriented C Compiler in C
☆25Mar 28, 2024Updated 2 years ago
merrymercy / Awesome-Efficient-LLM
View on GitHub
A curated list for Efficient Large Language Models
☆11Mar 25, 2024Updated 2 years ago
xiuxiazhang / KeplerAs
View on GitHub
An Open Source Kepler GPU Assembler
☆22Jan 23, 2017Updated 9 years ago
rickyzhang82 / linux-device-driver-introduction-and-practice
View on GitHub
Linux 驱动开发入门与实战--第2 版
☆19Aug 4, 2017Updated 8 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
nsood / LinuxDriver_CodeList
View on GitHub
Linux设备驱动开发详解代码清单
☆18Apr 9, 2015Updated 11 years ago
SuperScientificSoftwareLaboratory / DASP
View on GitHub
Source code of the SC '23 paper: "DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multipli…
☆29Jun 18, 2024Updated 2 years ago
hclhkbu / gcoospdm
View on GitHub
Sparse-dense matrix-matrix multiplication on GPUs
☆14Oct 15, 2018Updated 7 years ago
Chair-for-Security-Engineering / ecmongpu
View on GitHub
ECM Factorization on CUDA-GPUs
☆16Sep 29, 2020Updated 5 years ago
yassram / iterative-closest-point
View on GitHub
Iterative closest point GPU and CPU implementations (google benchmark)
☆19Nov 3, 2020Updated 5 years ago
lkawka / 3d-nearest-neighbor-search-in-kd-tree-cuda
View on GitHub
Finding the nearest neighbor for 3d points in KD tree. Two implementations: nn.cpp (CPU) and nn.cu (CUDA GPU).
☆14Feb 18, 2022Updated 4 years ago
houminz / paper-reading
View on GitHub
Paper Reading：涉及分布式、虚拟化、网络、机器学习
☆22Sep 27, 2020Updated 5 years ago