zhangkai0425/SGEMM-HPC

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zhangkai0425/SGEMM-HPC)

zhangkai0425 / SGEMM-HPC

Implementation and optimization of matrix multiplication on single CPU (HPC-THU-2023-Autumn)

☆18

Alternatives and similar repositories for SGEMM-HPC

Users that are interested in SGEMM-HPC are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

GetUpEarlier / minit
View on GitHub
☆26May 27, 2024Updated 2 years ago
cyhdmjzzy / DeepEP-Code-Analysis
View on GitHub
☆26Feb 27, 2026Updated 5 months ago
yifu-ding / BGEMM-CUDA
View on GitHub
BGEMM-CUDA is a CUDA-based low-bit GEMM kernel library for efficient neural network inference. It implements optimized binary and ternary…
☆20Aug 30, 2024Updated last year
CAS-CLab / BlockConv
View on GitHub
[TCAD 2021] Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA
☆17Jul 7, 2022Updated 4 years ago
Ther-nullptr / Awesome-Transformer-Accleration
View on GitHub
Paper list for accleration of transformers
☆14Jul 1, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
PannenetsF / TQT
View on GitHub
TQT's pytorch implementation.
☆22Dec 17, 2021Updated 4 years ago
Xtra-Computing / hacc_demo
View on GitHub
☆18Sep 25, 2025Updated 10 months ago
ThisisBillhe / torch_quantizer
View on GitHub
torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.
☆25Mar 29, 2024Updated 2 years ago
chuyiyao / Q-MAT
View on GitHub
☆12Feb 7, 2018Updated 8 years ago
SWE-Gym / SWE-Bench-Fork
View on GitHub
☆13Mar 5, 2025Updated last year
GATECH-EIC / DNN-Chip-Predictor
View on GitHub
[ICASSP'20] DNN-Chip Predictor: An Analytical Performance Predictor for DNN Accelerators with Various Dataflows and Hardware Architecture…
☆23Oct 1, 2022Updated 3 years ago
rinsa318 / normal2depth
View on GitHub
Estimate depth from surface normal.
☆12Aug 14, 2020Updated 5 years ago
HMC-ACE / hspiceParser
View on GitHub
A Python parser for hSpice output files and documentation of the hSpice output file format
☆27Mar 25, 2026Updated 4 months ago
GATECH-EIC / ShiftAddViT
View on GitHub
[NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
☆30Dec 6, 2023Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
jgoeders / dac_sdc_2020_designs
View on GitHub
Designs for finalist teams of the DAC System Design Contest
☆37Jul 8, 2020Updated 6 years ago
YangLinzhuo / cuda-sgemm-optimization
View on GitHub
CUDA SGEMM optimization note
☆15Oct 31, 2023Updated 2 years ago
Qwesh157 / conv_op_optimization
View on GitHub
This project is about convolution operator optimization on GPU, include GEMM based (Implicit GEMM) convolution.
☆44Sep 29, 2025Updated 9 months ago
nbasyl / OFQ
View on GitHub
The official implementation of the ICML 2023 paper OFQ-ViT
☆39Oct 3, 2023Updated 2 years ago
IVIPLab / LETNet
View on GitHub
This repository is an official PyTorch implementation of our paper “LETNet: Lightweight-Real-time-Semantic-Segmentation-Network-with-Effi…
☆37Nov 5, 2023Updated 2 years ago
KarhouTam / cuda-kernels
View on GitHub
Some common CUDA kernel implementations (Not the fastest).
☆30Jun 24, 2026Updated last month
ShaYeBuHui01 / flash_attention_inference
View on GitHub
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
☆15Aug 31, 2023Updated 2 years ago
DNN-Accelerators / Open-Source-IPs
View on GitHub
☆35Mar 1, 2019Updated 7 years ago
tingyunaiai9 / Echo-of-Time
View on GitHub
清华大学软件学院2025秋季学期《软件工程》课程，三四五组大作业：Echo of Time
☆28Apr 12, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ryaoi / lldb-peda
View on GitHub
just my way of printing info
☆16Feb 9, 2021Updated 5 years ago
sunxt99 / PIMCOMP-NN
View on GitHub
☆77Feb 12, 2025Updated last year
junaire / run.cu
View on GitHub
Compile & run a single CUDA file on the cloud GPUs
☆14Sep 8, 2024Updated last year
ziyeshanwai / python-laplacian-deformation
View on GitHub
a python version of laplacian deformation
☆22Mar 10, 2020Updated 6 years ago
Shahriar-0 / Digital-Logic-Design-Lab-Experiments-S2023
View on GitHub
Verilog implementation of different concepts in Digital Logic Design such as OTHFSM, AFG and Accelerators
☆11Dec 26, 2023Updated 2 years ago
eesast / THUAI5
View on GitHub
清华大学第五届人工智能挑战赛电子系赛道（原电子系第 23 届队式程序设计大赛 teamstyle23）
☆21Apr 8, 2024Updated 2 years ago
Inference-and-Optimization / High-Level-Synthesis-Study-Notes
View on GitHub
Vivado HLS study notes, courses, documents.
☆12Dec 7, 2019Updated 6 years ago
harvard-acc / EdgeBERT
View on GitHub
HW/SW co-design of sentence-level energy optimizations for latency-aware multi-task NLP inference
☆54Mar 24, 2024Updated 2 years ago
ZhangJingrong / gpu_topK_benchmark
View on GitHub
GPU TopK Benchmark
☆18Dec 19, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
xgqdut2016 / hpc_project
View on GitHub
some hpc project for learning
☆28Aug 28, 2024Updated last year
gouzigouzi / attention-residuals-for-chinese-llms
View on GitHub
A Chinese-focused PyTorch framework for exploring Attention Residuals in Qwen3-style causal LMs, with baseline, Block AttnRes, Full AttnR…
☆19May 3, 2026Updated 2 months ago
aim-uofa / model-quantization
View on GitHub
Collections of model quantization algorithms. Any issues, please contact Peng Chen (blueardour@gmail.com)
☆45Aug 19, 2021Updated 4 years ago
harvard-acc / FlexASR
View on GitHub
FlexASR: A Reconfigurable Hardware Accelerator for Attention-based Seq-to-Seq Networks
☆52May 20, 2026Updated 2 months ago
AlexwellChen / Toy_ML_Framework
View on GitHub
☆11May 16, 2026Updated 2 months ago
menik1126 / Swing-Bench
View on GitHub
[ICLR2026🔥Oral] SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving
☆15Feb 26, 2026Updated 5 months ago
Keenuts / vulkan-compute
View on GitHub
related to virglrender-vulkan: basic compute test application
☆19Feb 12, 2026Updated 5 months ago