carlushuang/cpu_gemm_opt

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/carlushuang/cpu_gemm_opt)

carlushuang / cpu_gemm_opt

how to design cpu gemm on x86 with avx256, that can beat openblas.

☆75

Alternatives and similar repositories for cpu_gemm_opt

Users that are interested in cpu_gemm_opt are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

wjc404 / GEMM_AVX512F
View on GitHub
SGEMM and DGEMM subroutines using AVX512F instructions.
☆15May 22, 2022Updated 4 years ago
zhiqwang / shufaCV
View on GitHub
☆26May 22, 2023Updated 3 years ago
flame / how-to-optimize-gemm
View on GitHub
☆2,025Jul 29, 2023Updated 3 years ago
tpoisonooo / how-to-optimize-gemm
View on GitHub
row-major matmul optimization
☆744May 14, 2026Updated 2 months ago
dag10 / Logisim_CPU
View on GitHub
Made a CPU in Logisim when I was 14 (2009), and wrote a naive assembler and compiler for it in Flash. The CPU's design is inspired by Don…
☆10Sep 30, 2016Updated 9 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
BUG1989 / ncnn-benchmark
View on GitHub
The benchmark of ncnn that is a high-performance neural network inference framework optimized for the mobile platform
☆72Mar 8, 2019Updated 7 years ago
zuoqing1988 / pytorch-ssd-for-ZQCNN
View on GitHub
用pytorch训练ssd，相比原版pytorch-ssd改动了不少
☆11Jul 4, 2022Updated 4 years ago
mmperf / mmperf
View on GitHub
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
☆138Sep 25, 2023Updated 2 years ago
sjubertie / teaching-SIMD
View on GitHub
Lecture on SIMD units
☆11Feb 28, 2017Updated 9 years ago
apuaaChen / vectorSparse
View on GitHub
☆32Aug 24, 2022Updated 3 years ago
zchrissirhcz / rocbuild
View on GitHub
Better CMake Experience
☆35Dec 9, 2025Updated 7 months ago
lixiuhong / batched_gemm
View on GitHub
☆40Feb 28, 2020Updated 6 years ago
Kyubyong / specAugment
View on GitHub
Tensor2tensor experiment with SpecAugment
☆46May 13, 2019Updated 7 years ago
cpuimage / fftw3
View on GitHub
FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and…
☆17Sep 6, 2018Updated 7 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
zuoqing1988 / train-ssd
View on GitHub
train ssd
☆10Apr 30, 2019Updated 7 years ago
jundaf2 / CUDA-INT8-GEMM
View on GitHub
CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API
☆37Sep 15, 2023Updated 2 years ago
maoweinuaa / Cartoonface-detection
View on GitHub
Base on retinaface and centerface modefied. frame work depend on pytorch.
☆31Jul 23, 2020Updated 6 years ago
HansRen1024 / PCN-ncnn
View on GitHub
PCN based on ncnn framework.
☆81Dec 21, 2018Updated 7 years ago
corleonechensiyu / tinyCNN
View on GitHub
将MNN拆解的简易前向推理框架(for study!)
☆24Feb 21, 2021Updated 5 years ago
Oneflow-Inc / conda-env
View on GitHub
☆12Mar 13, 2023Updated 3 years ago
olojuwin / facerecognize-for-mobile-phone
View on GitHub
适用于移动端的人脸识别模型，计算量与mobilefacenet相同，但megaface上提升了2%+
☆231Apr 17, 2020Updated 6 years ago
sandwichfish / VirFace
View on GitHub
This is a PyTorch implementation of "VirFace: Enhancing Face Recognition via Unlabeled Shallow Data" (CVPR 2021).
☆22Sep 30, 2022Updated 3 years ago
yzhaiustc / Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F
View on GitHub
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
☆164Feb 3, 2022Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
lhlvision / face-retrive-recognition-system
View on GitHub
亚洲人脸检索识别模型，支持人脸识别，人脸检索，支持各种平台，总模型大小9MB，ios、android、 pc(linux、windows、mac)总共(检测、对齐、特征计算)运行40ms，库独立，完全没有第三方库，方便部署，facial recognition system…
☆12Dec 15, 2020Updated 5 years ago
ChenhanYu / rnn
View on GitHub
General Stride K-Nearest Neighbors
☆14Jun 15, 2021Updated 5 years ago
OAID / MXNet-HRT
View on GitHub
Heterogeneous Run Time version of MXNet. Added heterogeneous capabilities to the MXNet, uses heterogeneous computing infrastructure frame…
☆72Feb 11, 2018Updated 8 years ago
zuoqing1988 / ZQ_VirtualAD
View on GitHub
☆10Jun 5, 2018Updated 8 years ago
campaul / octave
View on GitHub
An 8-bit CPU designed for education
☆20Nov 7, 2014Updated 11 years ago
791136190 / awesome-qat
View on GitHub
☆21Apr 13, 2022Updated 4 years ago
GPUPeople / ACSpGEMM
View on GitHub
Repository holding the code base to AC-SpGEMM : "Adaptive Sparse Matrix-Matrix Multiplication on the GPU"
☆31Jul 7, 2020Updated 6 years ago
Smorodov / RotatedRectLib
View on GitHub
Small library for working with rotated rectangle shaped image regions.
☆16Nov 7, 2017Updated 8 years ago
BUG1989 / caffe-int8-convert-tools
View on GitHub
Generate a quantization parameter file for ncnn framework int8 inference
☆517Jul 29, 2020Updated 6 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
rurban / dieharder
View on GitHub
A fixed version of Robert G. Brown's "dieharder" tests for random number generators.
☆13Apr 1, 2021Updated 5 years ago
aditya4d / gemm-vega64
View on GitHub
Implement asm gemm on vega64 for 4096x4096 fp32 matrix
☆22Oct 12, 2019Updated 6 years ago
zhiqi-0 / RDMA-MXNet-ps-lite
View on GitHub
RDMA Optimization on MXNet
☆14Nov 12, 2017Updated 8 years ago
scarsty / cccc-lite
View on GitHub
☆52May 27, 2026Updated 2 months ago
NervanaSystems / maxas
View on GitHub
Assembler for NVIDIA Maxwell architecture
☆1,074Jan 3, 2023Updated 3 years ago
zuoqing1988 / ZQ_SmokeSimulation
View on GitHub
☆12Jun 5, 2018Updated 8 years ago
Sanyuanliu / Caffe_face_attribute_classification
View on GitHub
multi-task learning method for face attributes learning
☆18Nov 29, 2018Updated 7 years ago