The repository targets the OpenCL gemm function performance optimization. It compares several libraries clBLAS, clBLAST, MIOpenGemm, Intel MKL(CPU) and cuBLAS(CUDA) on different matrix sizes/vendor's hardwares/OS. Out-of-the-box easy as MSVC, MinGW, Linux(CentOS) x86_64 binary provided. 在不同矩阵大小/硬件/操作系统下比较几个BLAS库的sgemm函数性能,提供binary,开盒即用。
☆17Mar 28, 2019Updated 6 years ago
Alternatives and similar repositories for gemm_optimization
Users that are interested in gemm_optimization are comparing it to the libraries listed below
Sorting:
- This repository provides tutorial, which discusses running sample publisher and subscriber using multiple transports of point_cloud_trans…☆11Feb 24, 2026Updated 2 weeks ago
- learn TensorRT from scratch🥰☆18Sep 29, 2024Updated last year
- To better understand the ggml library☆28Jun 13, 2025Updated 8 months ago
- 基于匈牙利匹配和卡尔曼滤波的SORT多目标跟踪算法。☆18Mar 10, 2023Updated 3 years ago
- Repository for code featured in "Machine Learning Assisted Optimization Methods for Automated Antenna Design"☆10Dec 28, 2024Updated last year
- AIInfra 和 AISystem开源课程项目☆40Jun 22, 2025Updated 8 months ago
- A Minimalistic Auto-Diff Optimization Framework for Teaching and Understanding Pytorch☆25Updated this week
- Vector Fitting Tool in MATLAB☆11Jun 28, 2019Updated 6 years ago
- OpenCL for Nets - A Deep Learning Framework based on OpenCL, written by C++. Supports popular MLP, RNN(LSTM), CNN(ResNet). Friendly debug…☆68Jun 3, 2019Updated 6 years ago
- This project is intended to build and deploy an SNPE model on Qualcomm Devices, which are having unsupported layers which are not part of…☆10Oct 4, 2021Updated 4 years ago
- 2023北京理工大学数字信号处理和雷达课程的matlab代码☆16Nov 10, 2025Updated 4 months ago
- Python and MATLAB code for time domain vector fitting☆13Feb 20, 2017Updated 9 years ago
- ThereminQ CLassiQ - QuantOPS : Orchestrate Qrack, Bonsai, Qimcifa and Tipsy in OpenCL, VCL and CUDA with an X WebUI☆13Jan 10, 2026Updated 2 months ago
- Direct Numerical Simulation of Turbulence using the Implicitly Dealiased Pseudospectral Method☆12Jun 20, 2025Updated 8 months ago
- Software library RLCM (recursively low-rank compressed matrices)☆14Apr 15, 2021Updated 4 years ago
- Paper: inexact GMRES with fast multipole method and low-p relaxation☆11Aug 23, 2023Updated 2 years ago
- Codes for antenna array design and optimisation☆11Dec 17, 2019Updated 6 years ago
- Lecture page for AAE4011, Semester 2, 2024-2025☆12Mar 20, 2025Updated 11 months ago
- C++17 Wrapper for ScaLAPACK☆11Oct 5, 2023Updated 2 years ago
- iEDA water-drop training initiative☆13Sep 10, 2024Updated last year
- 使用ONNXRuntime部署一种用于边缘检测的轻量级密集卷积神经网络LDC,包含C++和Python两个版本的程序☆11Apr 24, 2023Updated 2 years ago
- Author: Nathan Totorica Date: 5/14/2021 # Singularity Matrix Pertubation (SMP) This code was written for a class project in the course e…☆10May 14, 2021Updated 4 years ago
- Gaussian Splating 2d implemented in triton☆11Mar 19, 2024Updated last year
- Includes the SVD-based approximation algorithms for compressing deep learning models and the FPGA accelerators exploiting such approximat…☆16Mar 3, 2023Updated 3 years ago
- DERD-Net: Learning Depth from Event-based Ray Densities (NeurIPS 2025 Spotlight)☆16Nov 22, 2025Updated 3 months ago
- Schrodinger-Poisson solver in 1D in the conduction band☆12Jan 2, 2022Updated 4 years ago
- Computes effective mode in a 1D wave guide☆10Aug 9, 2021Updated 4 years ago
- A calibration method for multiple LiDAR and the GNSS-adied INS☆19Sep 22, 2025Updated 5 months ago
- FastSAM 部署rknn C++ 代码☆14May 30, 2024Updated last year
- Caffe: a fast open framework for deep learning.☆10May 19, 2017Updated 8 years ago
- Inference deployment of the llama3☆11Apr 21, 2024Updated last year
- Erasure code library for Erlang☆12Sep 5, 2024Updated last year
- [CVPR 2022] AlignQ: Alignment Quantization with ADMM-based Correlation Preservation☆11Jan 6, 2023Updated 3 years ago
- Cpp-Taskflow is a C++ library for managing and scheduling tasks that may be dependent on one another, represented as a DAG (directed acyc…☆13Apr 5, 2023Updated 2 years ago
- MATLAB implementation of Sparsity-promoting Least Mean Square (SLMS) and Normalized Least Mean Square (SNLMS) adaptive filters for system…☆11Oct 3, 2020Updated 5 years ago
- Software☆10Dec 5, 2024Updated last year
- AutoRNP -- Automated Repair of High Floating-Point Errors in Numerical Libraries☆12Dec 28, 2018Updated 7 years ago
- Shared library for tinyspline☆10Apr 20, 2024Updated last year
- A Three-Dimensional, Serial Fast Multipole Method Code Based on the Work of Walter Dehnen☆10Jun 12, 2018Updated 7 years ago