codyjrivera/tsm2x-imp

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/codyjrivera/tsm2x-imp)

codyjrivera / tsm2x-imp

Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA

☆35

Alternatives and similar repositories for tsm2x-imp

Users that are interested in tsm2x-imp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

PAA-NCIC / PPoPP2017_artifact
View on GitHub
Third party assembler and GEMM library for NVIDIA Kepler GPU
☆86Oct 8, 2019Updated 6 years ago
c3sr / tcu_scope
View on GitHub
☆50Jun 27, 2019Updated 7 years ago
MatanHamilis / one_stencil
View on GitHub
Multiple 1-stencil implementations using nvidia cuda.
☆12Dec 2, 2017Updated 8 years ago
MegEngine / cutlass-bak
View on GitHub
modified cutlass
☆16Oct 26, 2020Updated 5 years ago
chenxuhao / caffe-escoin
View on GitHub
Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs
☆16Feb 28, 2019Updated 7 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
xxcclong / GNN-Computing
View on GitHub
Artifact for PPoPP20 "Understanding and Bridging the Gaps in Current GNN Performance Optimizations"
☆42Nov 16, 2021Updated 4 years ago
YusukeNagasaka / Batched-SpMM
View on GitHub
New batched algorithm for sparse matrix-matrix multiplication (SpMM)
☆16May 7, 2019Updated 7 years ago
lixiuhong / batched_gemm
View on GitHub
☆40Feb 28, 2020Updated 6 years ago
hummingtree / cuda-graph-with-dynamic-parameters
View on GitHub
☆17Aug 9, 2022Updated 3 years ago
AlphaSparse / Library
View on GitHub
A sparse BLAS lib supporting multiple backends
☆51Mar 18, 2026Updated 4 months ago
HPCRL / ASPLOS_artifact
View on GitHub
☆13Nov 1, 2021Updated 4 years ago
daadaada / turingas
View on GitHub
Assembler for NVIDIA Volta and Turing GPUs
☆246Jan 13, 2022Updated 4 years ago
Ratbuyer / h100-features
View on GitHub
☆18Mar 12, 2025Updated last year
Stefan20162016 / maxas-explained
View on GitHub
maxas Scott Grey's maxas assembler sgemm explaining the (for me) missing parts https://github.com/NervanaSystems/maxas
☆17Dec 22, 2018Updated 7 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
apuaaChen / vectorSparse
View on GitHub
☆32Aug 24, 2022Updated 3 years ago
anony-sub / chameleon
View on GitHub
Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation
☆26Nov 7, 2019Updated 6 years ago
GVProf / GVProf
View on GitHub
GVProf: A Value Profiler for GPU-based Clusters
☆54Mar 24, 2024Updated 2 years ago
Yinghan-Li / YHs_Sample
View on GitHub
Yinghan's Code Sample
☆365Jul 25, 2022Updated 4 years ago
XiuYuLi / deepcore_source_code
View on GitHub
Subpart source code of of deepcore v0.7
☆27Jun 28, 2020Updated 6 years ago
nikil-ravi / trt_tutorial
View on GitHub
A tutorial on inference optimization using TensorRT
☆20Dec 24, 2024Updated last year
hyqneuron / asfermi
View on GitHub
assembler for NVIDIA FERMI. Imported from Google Code
☆77Mar 22, 2015Updated 11 years ago
dingwentao / MILOF
View on GitHub
Online Anomaly Detection for HPC Performance Data
☆11Jun 25, 2018Updated 8 years ago
aditya4d / gemm-vega64
View on GitHub
Implement asm gemm on vega64 for 4096x4096 fp32 matrix
☆22Oct 12, 2019Updated 6 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
AnonymousYWL / MYLIB
View on GitHub
☆18Apr 8, 2022Updated 4 years ago
rchardx / hopper-gemm
View on GitHub
☆48Nov 1, 2025Updated 8 months ago
hpdps-group / ICS23-GPULZ
View on GitHub
GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs
☆16Apr 18, 2025Updated last year
minhhn2910 / cuda-half2
View on GitHub
Convert CUDA programs from float data type to half or half2 with SIMDization
☆19May 28, 2019Updated 7 years ago
Leonardo-Ding / gpu_sgemm
View on GitHub
☆17Jul 1, 2020Updated 6 years ago
pku-liang / popa
View on GitHub
A unified programming framework for high and portable performance across FPGAs and GPUs
☆11Mar 23, 2025Updated last year
hgyhungry / ge-spmm
View on GitHub
☆115Jul 3, 2021Updated 5 years ago
WeiCheng14159 / bazel-android-opencl
View on GitHub
Run OpenCL program on MOBILE GPU (Qualcomm & ARM) !
☆18Jun 27, 2018Updated 8 years ago
decodecudabinary / Decoding-CUDA-Binary
View on GitHub
☆55Nov 21, 2019Updated 6 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
pigirons / conv3x3_m1
View on GitHub
This is a demo how to write a high performance convolution run on apple silicon
☆56Feb 8, 2022Updated 4 years ago
pku-liang / FlexTensor
View on GitHub
Automatic Schedule Exploration and Optimization Framework for Tensor Computations
☆184Apr 25, 2022Updated 4 years ago
wongsingfo / paper-util
View on GitHub
Utilities for paper writing.
☆12Jan 11, 2026Updated 6 months ago
fanghao6666 / CUDA-Matirx-Multiplication
View on GitHub
☆16May 30, 2019Updated 7 years ago
hpdps-group / hipSZ
View on GitHub
A portable implementation of SZ lossy compression for AMD GPUs and Hygon DCUs.
☆11Feb 26, 2025Updated last year
szcompressor / FZ-GPU
View on GitHub
FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Data on GPUs
☆15Jun 21, 2026Updated last month
NervanaSystems / maxas
View on GitHub
Assembler for NVIDIA Maxwell architecture
☆1,074Jan 3, 2023Updated 3 years ago