pigirons/cpufp

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/pigirons/cpufp)

pigirons / cpufp

A CPU tool for benchmarking the peak of floating points

☆586

Alternatives and similar repositories for cpufp

Users that are interested in cpufp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tpoisonooo / how-to-optimize-gemm
View on GitHub
row-major matmul optimization
☆743May 14, 2026Updated 2 months ago
flame / how-to-optimize-gemm
View on GitHub
☆2,020Jul 29, 2023Updated 2 years ago
pigirons / sgemm_hsw
View on GitHub
This is an implementation of sgemm_kernel on L1d cache.
☆233Feb 26, 2024Updated 2 years ago
flame / blislab
View on GitHub
BLISlab: A Sandbox for Optimizing GEMM
☆571Jun 17, 2021Updated 5 years ago
pigirons / conv3x3_m1
View on GitHub
This is a demo how to write a high performance convolution run on apple silicon
☆56Feb 8, 2022Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
OpenPPL / ppl.nn
View on GitHub
A primitive library for neural network
☆1,367Nov 24, 2024Updated last year
BBuf / how-to-optimize-gemm
View on GitHub
☆99May 20, 2026Updated 2 months ago
cloudcores / CuAssembler
View on GitHub
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆609Apr 20, 2023Updated 3 years ago
microsoft / nnfusion
View on GitHub
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
☆1,002Sep 19, 2024Updated last year
OpenPPL / ppl.llm.kernel.cuda
View on GitHub
☆150Jan 9, 2025Updated last year
tpoisonooo / chgemm
View on GitHub
symmetric int8 gemm
☆67Jun 7, 2020Updated 6 years ago
OpenPPL / ppl.cv
View on GitHub
ppl.cv is a high-performance image processing library of openPPL supporting various platforms.
☆515Oct 30, 2024Updated last year
alibaba / BladeDISC
View on GitHub
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
☆932Dec 30, 2024Updated last year
Yinghan-Li / YHs_Sample
View on GitHub
Yinghan's Code Sample
☆365Jul 25, 2022Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
MegEngine / MegCC
View on GitHub
MegCC是一个运行时超轻量，高效，移植简单的深度学习模型编译器
☆482Oct 23, 2024Updated last year
MegEngine / MegPeak
View on GitHub
☆256Sep 15, 2023Updated 2 years ago
OpenPPL / CuAssembler
View on GitHub
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆85Mar 20, 2023Updated 3 years ago
merrymercy / awesome-tensor-compilers
View on GitHub
A list of awesome compiler projects and papers for tensor computation and deep learning.
☆2,766Oct 19, 2024Updated last year
LeiWang1999 / tvm_gpu_gemm
View on GitHub
play gemm with tvm
☆91Jul 22, 2023Updated 2 years ago
billmuch / matmul_perf_test
View on GitHub
☆15Apr 15, 2022Updated 4 years ago
mmperf / mmperf
View on GitHub
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
☆138Sep 25, 2023Updated 2 years ago
BBuf / tvm_mlir_learn
View on GitHub
compiler learning resources collect.
☆2,759May 20, 2026Updated 2 months ago
NervanaSystems / maxas
View on GitHub
Assembler for NVIDIA Maxwell architecture
☆1,073Jan 3, 2023Updated 3 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
apache / tvm
View on GitHub
Open Machine Learning Compiler Framework
☆13,595Updated this week
NVIDIA / cutlass
View on GitHub
CUDA Templates and Python DSLs for High-Performance Linear Algebra
☆10,104Updated this week
XiuYuLi / deepcore_source_code
View on GitHub
Subpart source code of of deepcore v0.7
☆27Jun 28, 2020Updated 6 years ago
google / gemmlowp
View on GitHub
Low-precision matrix multiplication
☆1,844Jan 29, 2024Updated 2 years ago
Cjkkkk / CUDA_gemm
View on GitHub
A simple high performance CUDA GEMM implementation.
☆437Jan 4, 2024Updated 2 years ago
ARM-software / ComputeLibrary
View on GitHub
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologi…
☆3,171Jul 8, 2026Updated last week
buddy-compiler / buddy-mlir
View on GitHub
An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).
☆742Updated this week
OAID / AutoKernel
View on GitHub
AutoKernel 是一个简单易用，低门槛的自动算子优化工具，提高深度学习算法部署效率。
☆748Sep 23, 2022Updated 3 years ago
uxlfoundation / oneDNN
View on GitHub
oneAPI Deep Neural Network Library (oneDNN)
☆4,024Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
huawei-noah / bolt
View on GitHub
Bolt is a deep learning library with high performance and heterogeneous flexibility.
☆958Apr 11, 2025Updated last year
fish98 / CAShift
View on GitHub
CAShift: Benchmarking Log-Based Cloud Attack Detection under Normality Shift (FSE 2025)
☆15Jun 25, 2026Updated 3 weeks ago
PAA-NCIC / PPoPP2017_artifact
View on GitHub
Third party assembler and GEMM library for NVIDIA Kepler GPU
☆86Oct 8, 2019Updated 6 years ago
Ldpe2G / ArmNeonOptimization
View on GitHub
Arm neon optimization practice
☆393Dec 22, 2020Updated 5 years ago
libxsmm / libxsmm
View on GitHub
Library for specialized dense and sparse matrix operations, and deep learning primitives.
☆968Updated this week
OpenMathLib / OpenBLAS
View on GitHub
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
☆7,521Updated this week
tlc-pack / cutlass_fpA_intB_gemm
View on GitHub
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
☆96Jun 21, 2026Updated 3 weeks ago