billmuch/matmul_perf_test

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/billmuch/matmul_perf_test)

billmuch / matmul_perf_test

☆15

Alternatives and similar repositories for matmul_perf_test

Users that are interested in matmul_perf_test are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yester31 / Cutlass_EX
View on GitHub
study of cutlass
☆22Nov 10, 2024Updated last year
weishengying / cutlass_flash_atten_fp8
View on GitHub
使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention
☆82Aug 12, 2024Updated last year
LeiWang1999 / tvm_gpu_gemm
View on GitHub
play gemm with tvm
☆91Jul 22, 2023Updated 3 years ago
NVIDIA / atex
View on GitHub
A TensorFlow Extension: GPU performance tools for TensorFlow.
☆26Jul 27, 2023Updated 2 years ago
fanghao6666 / CUDA-Matirx-Multiplication
View on GitHub
☆16May 30, 2019Updated 7 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
BBuf / how-to-optimize-gemm
View on GitHub
☆99May 20, 2026Updated 2 months ago
sjtu-epcc / DVABatch
View on GitHub
☆21May 13, 2022Updated 4 years ago
Oneflow-Inc / oneflow_convert
View on GitHub
OneFlow->ONNX
☆42Apr 19, 2023Updated 3 years ago
AlibabaResearch / recom
View on GitHub
An Optimizing Compiler for Recommendation Model Inference
☆26Jun 5, 2025Updated last year
HydraQYH / hp_rms_norm
View on GitHub
High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)
☆30Jan 22, 2026Updated 6 months ago
daquexian / faster-rwkv
View on GitHub
☆126Dec 15, 2023Updated 2 years ago
ARM-software / HPCG_for_Arm
View on GitHub
☆30Dec 16, 2022Updated 3 years ago
c3sr / tcu_scope
View on GitHub
☆50Jun 27, 2019Updated 7 years ago
tpoisonooo / how-to-optimize-gemm
View on GitHub
row-major matmul optimization
☆743May 14, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
DiscreteTom / dt-blog-boilerplate
View on GitHub
DiscreteTom's Blog Boilerplate.
☆10Mar 6, 2023Updated 3 years ago
TensorflowXLABeginner / XLA-Report
View on GitHub
This repository is the summary of all of our works for the XLA.
☆11Jan 14, 2018Updated 8 years ago
MegEngine / MegCC
View on GitHub
MegCC是一个运行时超轻量，高效，移植简单的深度学习模型编译器
☆483Oct 23, 2024Updated last year
mochi-hpc / mochi-thallium
View on GitHub
Thallium is a C++14 library wrapping Margo, Mercury, and Argobots and providing an object-oriented way to use these libraries.
☆16May 4, 2026Updated 2 months ago
tpoisonooo / chgemm
View on GitHub
symmetric int8 gemm
☆67Jun 7, 2020Updated 6 years ago
banburytang / List-of-Chinese-Open-Source-Project-Financing
View on GitHub
☆16Nov 2, 2022Updated 3 years ago
iree-org / iree-llvm-sandbox
View on GitHub
A sandbox for quick iteration and experimentation on projects related to IREE, MLIR, and LLVM
☆62Apr 13, 2026Updated 3 months ago
Yinghan-Li / YHs_Sample
View on GitHub
Yinghan's Code Sample
☆365Jul 25, 2022Updated 3 years ago
lenLRX / AmpereSparseMatmul
View on GitHub
study of Ampere' Sparse Matmul
☆18Jan 10, 2021Updated 5 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
jeremyxu2010 / toy-compiler
View on GitHub
个人学习编译原理、理解创造一个编译器主体流程的小项目
☆10Oct 7, 2020Updated 5 years ago
quiver-team / quiver-feature
View on GitHub
High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph
☆55Jul 3, 2022Updated 4 years ago
acl-dev / demo
View on GitHub
Using acl and c/c++ writing coroutine,http/https,server,json,redis,mysql,network,nio,etc,.
☆20Oct 8, 2025Updated 9 months ago
wu-kan / wuk_cupti_wrapper
View on GitHub
a simple API to use CUPTI
☆10Aug 19, 2025Updated 11 months ago
tlc-pack / cutlass_fpA_intB_gemm
View on GitHub
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
☆96Jun 21, 2026Updated last month
shaoshitong / diffusion-model-learning
View on GitHub
Document the demo and a series of documents for learning the diffusion model.
☆41Jun 29, 2023Updated 3 years ago
wkqscut / DCGNet
View on GitHub
The code for IJCAI 2019 paper "Deep Cascade Generation on Point Sets"
☆14Oct 3, 2023Updated 2 years ago
Cjkkkk / CUDA_gemm
View on GitHub
A simple high performance CUDA GEMM implementation.
☆437Jan 4, 2024Updated 2 years ago
BUAA-CI-LAB / GNN-Feature-Decomposition
View on GitHub
Using Feature Decomposition method to accelerate GNN inference
☆13Sep 27, 2021Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
masahi / torchscript-to-tvm
View on GitHub
☆68Mar 4, 2023Updated 3 years ago
Adlik / model_zoo
View on GitHub
☆11Dec 26, 2025Updated 6 months ago
SkyworkAI / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆17Jun 3, 2024Updated 2 years ago
xiaotianxia / vue-163news-dev
View on GitHub
a vue-demo：vue仿网易新闻m站
☆10Jul 26, 2017Updated 8 years ago
L1aoXingyu / llm-infer-bench
View on GitHub
☆12Sep 1, 2023Updated 2 years ago
howardlau1999 / autograd
View on GitHub
A simple demonstration of how PyTorch autograd works
☆16Sep 23, 2021Updated 4 years ago
kmiku7 / python-2.5-annotated
View on GitHub
python-2.5-annotated
☆14Jan 7, 2015Updated 11 years ago