The repository targets the OpenCL gemm function performance optimization. It compares several libraries clBLAS, clBLAST, MIOpenGemm, Intel MKL(CPU) and cuBLAS(CUDA) on different matrix sizes/vendor's hardwares/OS. Out-of-the-box easy as MSVC, MinGW, Linux(CentOS) x86_64 binary provided. 在不同矩阵大小/硬件/操作系统下比较几个BLAS库的sgemm函数性能,提供binary,开盒即用。
☆17Mar 28, 2019Updated 6 years ago
Alternatives and similar repositories for gemm_optimization
Users that are interested in gemm_optimization are comparing it to the libraries listed below
Sorting:
- This repository provides tutorial, which discusses running sample publisher and subscriber using multiple transports of point_cloud_trans…☆11Updated this week
- learn TensorRT from scratch🥰☆18Sep 29, 2024Updated last year
- To better understand the ggml library☆27Jun 13, 2025Updated 8 months ago
- 基于匈牙利匹配和卡尔曼滤波的SORT多目标跟踪算法。☆19Mar 10, 2023Updated 2 years ago
- Repository for code featured in "Machine Learning Assisted Optimization Methods for Automated Antenna Design"☆10Dec 28, 2024Updated last year
- AIInfra 和 AISystem开源课程项目☆38Jun 22, 2025Updated 8 months ago
- A Minimalistic Auto-Diff Optimization Framework for Teaching and Understanding Pytorch☆26Updated this week
- Vector Fitting Tool in MATLAB☆11Jun 28, 2019Updated 6 years ago
- OpenCL for Nets - A Deep Learning Framework based on OpenCL, written by C++. Supports popular MLP, RNN(LSTM), CNN(ResNet). Friendly debug…☆68Jun 3, 2019Updated 6 years ago
- 2023北京理工大学数字信号处理和雷达课程的matlab代码☆16Nov 10, 2025Updated 3 months ago
- This project is intended to build and deploy an SNPE model on Qualcomm Devices, which are having unsupported layers which are not part of…☆10Oct 4, 2021Updated 4 years ago
- Python and MATLAB code for time domain vector fitting☆13Feb 20, 2017Updated 9 years ago
- Software library RLCM (recursively low-rank compressed matrices)☆14Apr 15, 2021Updated 4 years ago
- Lecture page for AAE4011, Semester 2, 2024-2025☆12Mar 20, 2025Updated 11 months ago
- Direct Numerical Simulation of Turbulence using the Implicitly Dealiased Pseudospectral Method☆12Jun 20, 2025Updated 8 months ago
- iEDA water-drop training initiative☆13Sep 10, 2024Updated last year
- Paper: inexact GMRES with fast multipole method and low-p relaxation☆11Aug 23, 2023Updated 2 years ago
- 使用ONNXRuntime部署一种用于边缘检测的轻量级密集卷积神经网络LDC,包含C++和Python两个版本的程序☆11Apr 24, 2023Updated 2 years ago
- Author: Nathan Totorica Date: 5/14/2021 # Singularity Matrix Pertubation (SMP) This code was written for a class project in the course e…☆10May 14, 2021Updated 4 years ago
- C++17 Wrapper for ScaLAPACK☆11Oct 5, 2023Updated 2 years ago
- Profitable MT5 Expert Advisors☆21Feb 22, 2026Updated last week
- Codes for antenna array design and optimisation☆11Dec 17, 2019Updated 6 years ago
- ThereminQ CLassiQ - QuantOPS : Orchestrate Qrack, Bonsai, Qimcifa and Tipsy in OpenCL, VCL and CUDA with an X WebUI☆13Jan 10, 2026Updated last month
- AOCL-Utils library to get CPU architecture, Cache information and CPU features flags etc.☆17Jan 3, 2026Updated last month
- Parallel LiDAR Point Cloud Preprocessing for Autonomous Driving Applications☆10Apr 2, 2024Updated last year
- GPU implementation of Winograd convolution☆10Oct 23, 2017Updated 8 years ago
- MobileSAM のエンコーダー/デコーダーをONNXに変換し、推論するサンプル☆12Apr 11, 2024Updated last year
- 🎉My Collections of CUDA Kernels~☆11Jun 25, 2024Updated last year
- YOLOv12 TensorRT 端到端模型加速推理和INT8量化实现☆13Mar 5, 2025Updated 11 months ago
- Tool to convert Microsoft Visual C++ projects and solutions to CMake☆15Updated this week
- Privacy-Preserving Multiple Tensor Factorization for Synthesizing Large-Scale Location Traces☆14Sep 14, 2021Updated 4 years ago
- A set of tools to work with cgroup tree and process classification/QoS according to it☆10Oct 1, 2019Updated 6 years ago
- Cpp-Taskflow is a C++ library for managing and scheduling tasks that may be dependent on one another, represented as a DAG (directed acyc…☆13Apr 5, 2023Updated 2 years ago
- Matlab codes that solve Maxwell's equations with some light-matter interactions using the finite difference time domain (FDTD) method☆10Aug 7, 2019Updated 6 years ago
- A calibration method for multiple LiDAR and the GNSS-adied INS☆19Sep 22, 2025Updated 5 months ago
- Vector Fitting☆15Apr 7, 2022Updated 3 years ago
- Shared library for tinyspline☆10Apr 20, 2024Updated last year
- Inference deployment of the llama3☆11Apr 21, 2024Updated last year
- 关于算法处理实时视频流性能不足使用并行处理的方案和优化(APP层面)。☆11Jun 5, 2021Updated 4 years ago