The repository targets the OpenCL gemm function performance optimization. It compares several libraries clBLAS, clBLAST, MIOpenGemm, Intel MKL(CPU) and cuBLAS(CUDA) on different matrix sizes/vendor's hardwares/OS. Out-of-the-box easy as MSVC, MinGW, Linux(CentOS) x86_64 binary provided. 在不同矩阵大小/硬件/操作系统下比较几个BLAS库的sgemm函数性能,提供binary,开盒即用。
☆17Mar 28, 2019Updated 7 years ago
Alternatives and similar repositories for gemm_optimization
Users that are interested in gemm_optimization are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repository provides tutorial, which discusses running sample publisher and subscriber using multiple transports of point_cloud_trans…☆11Mar 17, 2026Updated 3 weeks ago
- Numerical experiments on Jacobi SVD algorithm☆10Jun 3, 2018Updated 7 years ago
- learn TensorRT from scratch🥰☆18Sep 29, 2024Updated last year
- 基于匈牙利匹配和卡尔曼滤波的SORT多目标跟踪算法。☆19Mar 10, 2023Updated 3 years ago
- ThereminQ CLassiQ - QuantOPS : Orchestrate Qrack, Bonsai, Qimcifa and Tipsy in OpenCL, VCL and CUDA with an X WebUI☆13Jan 10, 2026Updated 2 months ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Plane Wave Density Functional Theory Code for the GPU☆12Jan 23, 2015Updated 11 years ago
- MIPS R10000 architecture simulator with C++☆10Jun 8, 2023Updated 2 years ago
- SURF MPS iimplementation for all GPU's in a node☆11Apr 13, 2022Updated 3 years ago
- do some exercise☆14Dec 2, 2025Updated 4 months ago
- A poor man's density functional theory program☆14Feb 1, 2026Updated 2 months ago
- The official website of One Student One Chip project.☆12Feb 5, 2026Updated 2 months ago
- C++ implementation of the algorithm in "Fast and Accurate Least-Mean-Squares Solvers", NIPS19☆11Mar 4, 2020Updated 6 years ago
- Erasure code library for Erlang☆12Sep 5, 2024Updated last year
- A Minimalistic Auto-Diff Optimization Framework for Teaching and Understanding Pytorch☆27Mar 12, 2026Updated 3 weeks ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Compress BiSeNet with Structure Knowledge Distillation for Real-time image segmentation on wali-TX2☆11Jul 29, 2020Updated 5 years ago
- ☆20Oct 1, 2018Updated 7 years ago
- Paper: inexact GMRES with fast multipole method and low-p relaxation☆11Aug 23, 2023Updated 2 years ago
- A [Genshin Impact] artifacts enhancement predictor. 一个【原神】圣遗物强化预测工具。☆12May 18, 2022Updated 3 years ago
- ☆12May 22, 2022Updated 3 years ago
- This project is intended to build and deploy an SNPE model on Qualcomm Devices, which are having unsupported layers which are not part of…☆10Oct 4, 2021Updated 4 years ago
- A small collection of Quantum Phase Estimation algorithms coded in python using IBM's qiskit library.☆12Nov 25, 2021Updated 4 years ago
- Securing Data Analytics on Intel SGX using Randomization☆13Aug 30, 2017Updated 8 years ago
- The matlab code of Sparse Contextual Activation (SCA) published in TIP 2016☆10Mar 18, 2018Updated 8 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Fast, multithreaded, AVX/FMA matrix multiplication kernel in C++ 17☆18Nov 28, 2018Updated 7 years ago
- semantic segmentation using pytorch☆11Dec 1, 2017Updated 8 years ago
- ☆16Aug 11, 2016Updated 9 years ago
- C++ genetic algorithms scientific library☆15Aug 12, 2023Updated 2 years ago
- ☆12Mar 31, 2017Updated 9 years ago
- Clone of https://code.google.com/p/google-coredumper/ with enhancements by Amadeus☆13Jul 2, 2024Updated last year
- Basic chisel difftest environment for RTL design (WIP☆20Mar 8, 2025Updated last year
- YOLOv12 TensorRT 端到端模型加速推理和INT8量化实现☆13Mar 5, 2025Updated last year
- Converts CLIP models to ONNX☆11Jan 17, 2023Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Caffe: a fast open framework for deep learning.☆10May 19, 2017Updated 8 years ago
- Software library RLCM (recursively low-rank compressed matrices)☆14Apr 15, 2021Updated 4 years ago
- ☆10Sep 3, 2016Updated 9 years ago
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆45Oct 25, 2021Updated 4 years ago
- Matlab implementation of the CS video reconstruction method RRS☆11May 21, 2018Updated 7 years ago
- Xiangshan deterministic workloads generator☆24May 14, 2025Updated 10 months ago
- [CVPR 2022] AlignQ: Alignment Quantization with ADMM-based Correlation Preservation☆11Jan 6, 2023Updated 3 years ago