carlushuang / cpu_gemm_optView external linksLinks
how to design cpu gemm on x86 with avx256, that can beat openblas.
☆73Apr 15, 2019Updated 6 years ago
Alternatives and similar repositories for cpu_gemm_opt
Users that are interested in cpu_gemm_opt are comparing it to the libraries listed below
Sorting:
- Made a CPU in Logisim when I was 14 (2009), and wrote a naive assembler and compiler for it in Flash. The CPU's design is inspired by Don…☆10Sep 30, 2016Updated 9 years ago
- ☆1,988Jul 29, 2023Updated 2 years ago
- ☆12Mar 13, 2023Updated 2 years ago
- SGEMM and DGEMM subroutines using AVX512F instructions.☆15May 22, 2022Updated 3 years ago
- train ssd☆10Apr 30, 2019Updated 6 years ago
- row-major matmul optimization☆701Aug 20, 2025Updated 5 months ago
- The benchmark of ncnn that is a high-performance neural network inference framework optimized for the mobile platform☆72Mar 8, 2019Updated 6 years ago
- a Deep Residual Network Example for MXNet on cifar10 dataset☆20Jan 27, 2016Updated 10 years ago
- PCN based on ncnn framework.☆81Dec 21, 2018Updated 7 years ago
- Using OpenCV and MatLab for edge detection in the Lab colorspace☆12Feb 20, 2015Updated 10 years ago
- 亚洲人脸检索识别模型,支持人脸识别,人脸检索,支持各种平台,总模型大小9MB,ios、android、 pc(linux、windows、mac)总共(检测、对齐、特征计算)运行40ms,库独立,完全没有第三方库,方便部署,facial recognition system…☆12Dec 15, 2020Updated 5 years ago
- Tensor2tensor experiment with SpecAugment☆46May 13, 2019Updated 6 years ago
- Caffe: a fast open framework for deep learning.☆12Apr 6, 2017Updated 8 years ago
- Scalable GPU Kernel Fission/Fusion Transformation for Memory-Bound Kernels☆14Aug 26, 2015Updated 10 years ago
- CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API☆35Sep 15, 2023Updated 2 years ago
- Repository holding the code base to AC-SpGEMM : "Adaptive Sparse Matrix-Matrix Multiplication on the GPU"☆31Jul 7, 2020Updated 5 years ago
- ☆12Dec 19, 2016Updated 9 years ago
- This repo contains LaTeX template for experiment report.☆11Aug 17, 2021Updated 4 years ago
- Demonstrations for the interactive exploration of selected core concepts of audio, image and video processing as well as related topics☆17Sep 4, 2025Updated 5 months ago
- CPU Physically Based Path Tracer Engine☆15May 14, 2021Updated 4 years ago
- A set of benchmarks to compare the main 2D marker detection and tracking libraries☆11Oct 19, 2017Updated 8 years ago
- ☆12Sep 29, 2017Updated 8 years ago
- ☆32Aug 24, 2022Updated 3 years ago
- ☆17Jun 5, 2018Updated 7 years ago
- ☆16Nov 21, 2017Updated 8 years ago
- ☆13Feb 26, 2017Updated 8 years ago
- The xyz algorithm for fast interaction search in high-dimensional data.☆14Jul 28, 2017Updated 8 years ago
- ☆18May 14, 2024Updated last year
- train Gender and Age☆39Nov 6, 2018Updated 7 years ago
- The website of the Objects365 Dataset☆13Jun 15, 2023Updated 2 years ago
- ☆21Aug 14, 2024Updated last year
- Small library for working with rotated rectangle shaped image regions.☆16Nov 7, 2017Updated 8 years ago
- RDMA Optimization on MXNet☆14Nov 12, 2017Updated 8 years ago
- Robust Tracking Using Region Proposal Networks☆13Jun 10, 2017Updated 8 years ago
- ☆40Feb 28, 2020Updated 5 years ago
- Torch is a scientific computing framework with wide support for machine learning algorithms. It is easy to use and efficient, thanks to a…☆37Aug 4, 2022Updated 3 years ago
- ☆16Apr 11, 2022Updated 3 years ago
- multi-task learning method for face attributes learning☆18Nov 29, 2018Updated 7 years ago
- Quantized Tiny Yolo Demo on Android☆30Jun 20, 2017Updated 8 years ago