The repository targets the OpenCL gemm function performance optimization. It compares several libraries clBLAS, clBLAST, MIOpenGemm, Intel MKL(CPU) and cuBLAS(CUDA) on different matrix sizes/vendor's hardwares/OS. Out-of-the-box easy as MSVC, MinGW, Linux(CentOS) x86_64 binary provided. 在不同矩阵大小/硬件/操作系统下比较几个BLAS库的sgemm函数性能,提供binary,开盒即用。
☆17Mar 28, 2019Updated 7 years ago
Alternatives and similar repositories for gemm_optimization
Users that are interested in gemm_optimization are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- To better understand the ggml library☆28Jun 13, 2025Updated last year
- Numerical experiments on Jacobi SVD algorithm☆10Jun 3, 2018Updated 8 years ago
- learn TensorRT from scratch🥰☆18Sep 29, 2024Updated last year
- 基于匈牙利匹配和卡尔曼滤波的SORT多目标跟踪算法。☆19Mar 10, 2023Updated 3 years ago
- ThereminQ CLassiQ - QuantOPS : Orchestrate Qrack, Bonsai, Qimcifa and Tipsy in OpenCL, VCL and CUDA with an X WebUI☆13Jan 10, 2026Updated 5 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Interface between Openfermion and Dirac to perform relativistic quantum chemistry calculations simulated on a quantum computer☆14Aug 12, 2022Updated 3 years ago
- This is a tutorial prepared for a summer school in Changsha in 2021☆10Jul 15, 2021Updated 4 years ago
- My personal NUR repository☆10Jun 21, 2026Updated last week
- iEDA water-drop training initiative☆14Sep 10, 2024Updated last year
- MIPS R10000 architecture simulator with C++☆11Jun 8, 2023Updated 3 years ago
- The 'missing header' for Chisel☆24Feb 5, 2026Updated 4 months ago
- A poor man's density functional theory program☆14May 18, 2026Updated last month
- The official website of One Student One Chip project.☆12Feb 5, 2026Updated 4 months ago
- A set of tools to work with cgroup tree and process classification/QoS according to it☆10Oct 1, 2019Updated 6 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- DeepSeek-V3, R1 671B on 8xH100 Throughput Benchmarks☆22Mar 13, 2025Updated last year
- C++ implementation of the algorithm in "Fast and Accurate Least-Mean-Squares Solvers", NIPS19☆11Mar 4, 2020Updated 6 years ago
- A Minimalistic Auto-Diff Optimization Framework for Teaching and Understanding Pytorch☆27Jun 10, 2026Updated 2 weeks ago
- Paper: inexact GMRES with fast multipole method and low-p relaxation☆11Aug 23, 2023Updated 2 years ago
- ☆12May 22, 2022Updated 4 years ago
- This project is intended to build and deploy an SNPE model on Qualcomm Devices, which are having unsupported layers which are not part of…☆10Oct 4, 2021Updated 4 years ago
- A small collection of Quantum Phase Estimation algorithms coded in python using IBM's qiskit library.☆11Nov 25, 2021Updated 4 years ago
- The matlab code of Sparse Contextual Activation (SCA) published in TIP 2016☆10Mar 18, 2018Updated 8 years ago
- Securing Data Analytics on Intel SGX using Randomization☆13Aug 30, 2017Updated 8 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- MobileSAM のエンコーダー/デコーダーをONNXに変換し、推論するサンプル☆12Apr 11, 2024Updated 2 years ago
- Repository for code featured in "Machine Learning Assisted Optimization Methods for Automated Antenna Design"☆10Dec 28, 2024Updated last year
- Netlib Scalapack with robust CMake☆14Mar 26, 2026Updated 3 months ago
- 个人笔记☆18Jun 22, 2026Updated last week
- A VQE-based quantum chemistry simulator☆13Jun 20, 2020Updated 6 years ago
- ☆13Mar 31, 2017Updated 9 years ago
- Collection of codes used for quantum chemistry calculations, including Hartree-Fock, Coupled Cluster (CCSD), EOMCC, and other various thi…☆19May 17, 2021Updated 5 years ago
- Basic chisel difftest environment for RTL design (WIP☆21Mar 8, 2025Updated last year
- C++ genetic algorithms scientific library☆15Aug 12, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- YOLOv12 TensorRT 端到端模型加速推理和INT8量化实现☆14Mar 5, 2025Updated last year
- Converts CLIP models to ONNX☆11Jan 17, 2023Updated 3 years ago
- ☆16May 3, 2024Updated 2 years ago
- Caffe: a fast open framework for deep learning.☆10May 19, 2017Updated 9 years ago
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆45Oct 25, 2021Updated 4 years ago
- SODECL is a library of ordinary differential equation (ODE) and stochastic differential equation (SDE) solvers in OpenCL.☆11Jul 4, 2020Updated 5 years ago
- Direct Numerical Simulation of Turbulence using the Implicitly Dealiased Pseudospectral Method☆12Jun 20, 2025Updated last year