how to design cpu gemm on x86 with avx256, that can beat openblas.
☆73Apr 15, 2019Updated 6 years ago
Alternatives and similar repositories for cpu_gemm_opt
Users that are interested in cpu_gemm_opt are comparing it to the libraries listed below
Sorting:
- Made a CPU in Logisim when I was 14 (2009), and wrote a naive assembler and compiler for it in Flash. The CPU's design is inspired by Don…☆10Sep 30, 2016Updated 9 years ago
- ☆1,992Jul 29, 2023Updated 2 years ago
- ☆12Mar 13, 2023Updated 2 years ago
- ☆10Jun 5, 2018Updated 7 years ago
- Lecture on SIMD units☆11Feb 28, 2017Updated 9 years ago
- train ssd☆10Apr 30, 2019Updated 6 years ago
- SGEMM and DGEMM subroutines using AVX512F instructions.☆15May 22, 2022Updated 3 years ago
- row-major matmul optimization☆707Feb 24, 2026Updated last week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆138Sep 25, 2023Updated 2 years ago
- The benchmark of ncnn that is a high-performance neural network inference framework optimized for the mobile platform☆72Mar 8, 2019Updated 6 years ago
- a Deep Residual Network Example for MXNet on cifar10 dataset☆20Jan 27, 2016Updated 10 years ago
- 亚洲人脸检索识别模型,支持人脸识别,人脸检索,支持各种平台,总模型大小9MB,ios、android、 pc(linux、windows、mac)总共(检测、对齐、特征计算)运行40ms,库独立,完全没有第三方库,方便部署,facial recognition system…☆12Dec 15, 2020Updated 5 years ago
- Using OpenCV and MatLab for edge detection in the Lab colorspace☆12Feb 20, 2015Updated 11 years ago
- Tensor2tensor experiment with SpecAugment☆46May 13, 2019Updated 6 years ago
- Scalable GPU Kernel Fission/Fusion Transformation for Memory-Bound Kernels☆14Aug 26, 2015Updated 10 years ago
- Caffe: a fast open framework for deep learning.☆12Apr 6, 2017Updated 8 years ago
- CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API☆35Sep 15, 2023Updated 2 years ago
- ☆12Jun 5, 2018Updated 7 years ago
- Repository holding the code base to AC-SpGEMM : "Adaptive Sparse Matrix-Matrix Multiplication on the GPU"☆31Jul 7, 2020Updated 5 years ago
- ☆12Dec 19, 2016Updated 9 years ago
- ☆12Sep 29, 2017Updated 8 years ago
- An experimental CPU design☆14Feb 9, 2020Updated 6 years ago
- This repo contains LaTeX template for experiment report.☆11Aug 17, 2021Updated 4 years ago
- CPU Physically Based Path Tracer Engine☆15May 14, 2021Updated 4 years ago
- A set of benchmarks to compare the main 2D marker detection and tracking libraries☆11Oct 19, 2017Updated 8 years ago
- ☆32Aug 24, 2022Updated 3 years ago
- Some deep learning models written with mxnet and C++11.☆12Feb 6, 2018Updated 8 years ago
- ☆16Nov 21, 2017Updated 8 years ago
- General Stride K-Nearest Neighbors☆14Jun 15, 2021Updated 4 years ago
- ☆13Feb 26, 2017Updated 9 years ago
- The xyz algorithm for fast interaction search in high-dimensional data.☆14Jul 28, 2017Updated 8 years ago
- ☆17Jun 5, 2018Updated 7 years ago
- Visualize TVM Relay program graph☆12Nov 19, 2019Updated 6 years ago
- ☆18May 14, 2024Updated last year
- 适用于移动端的人脸识别模型,计算量与mobilefacenet相同,但megaface上提升了2%+☆232Apr 17, 2020Updated 5 years ago
- train Gender and Age☆39Nov 6, 2018Updated 7 years ago
- The website of the Objects365 Dataset☆13Jun 15, 2023Updated 2 years ago
- ☆22Aug 14, 2024Updated last year
- ☆40Feb 28, 2020Updated 6 years ago