A profiler to disclose and quantify hardware features on GPUs.
☆176May 15, 2022Updated 3 years ago
Alternatives and similar repositories for ArchProbe
Users that are interested in ArchProbe are comparing it to the libraries listed below
Sorting:
- Multi-branch model for concurrent execution☆18Jun 27, 2023Updated 2 years ago
- A micro Vulkan compute pipeline and a collection of benchmarking compute shaders☆261Mar 27, 2025Updated 11 months ago
- LLM inference in C/C++☆20Oct 22, 2025Updated 4 months ago
- modified cutlass☆15Oct 26, 2020Updated 5 years ago
- A tool for patching the TensorFlow frozen protobuf file for compatibility to RKNN & SNPE SDK.☆11Feb 22, 2021Updated 5 years ago
- ☆10Sep 4, 2025Updated 5 months ago
- mperf是一个面向移动/嵌入式平台的算子性能调优工具箱☆192Aug 17, 2023Updated 2 years ago
- The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]☆19Aug 4, 2022Updated 3 years ago
- ☆19Feb 28, 2022Updated 4 years ago
- ☆256Sep 15, 2023Updated 2 years ago
- Parsers for CUDA binary files☆24Dec 29, 2023Updated 2 years ago
- A tool which profiles Vulkan devices to find their peak capacities☆160Jan 12, 2026Updated last month
- Derivation and numerical validation for the paper "Microsurface Transformations" (EGSR 2022) by Asen Atanasov, Vladimir Koylazov, Rossen …☆23Jul 11, 2022Updated 3 years ago
- ☆15Dec 16, 2021Updated 4 years ago
- row-major matmul optimization☆703Updated this week
- A utility library for application developers to sample Arm Immortalis GPU or Arm Mali GPU performance counters.☆263Jan 13, 2026Updated last month
- A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.☆1,006Sep 19, 2024Updated last year
- Apple G13 GPU architecture docs and tools☆644May 16, 2025Updated 9 months ago
- Dynamic suballocators for external memory (e.g., Vulkan device memory). Umaintained - consider migrating to https://crates.io/crates/offs…☆15Jul 22, 2022Updated 3 years ago
- A tool which profiles OpenCL devices to find their peak capacities☆481Dec 3, 2025Updated 2 months ago
- ☆78May 28, 2023Updated 2 years ago
- CPrune: Compiler-Informed Model Pruning for Efficient Target-Aware DNN Execution☆17Jun 25, 2023Updated 2 years ago
- ncnn和pnnx格式编辑器☆137Oct 7, 2024Updated last year
- EdgeCortix maintained and extended fork of Apache TVM compiler stack utilized by MERA framework. TVM is an open deep learning compiler st…☆11Dec 22, 2023Updated 2 years ago
- Fundamental Sources for Water Wave Animation☆20Dec 8, 2022Updated 3 years ago
- FidelityFX Parallel Sort☆114Oct 8, 2021Updated 4 years ago
- MLPerf™ Mobile models☆26Nov 16, 2025Updated 3 months ago
- Light weight SPIR-V reflection library☆111Updated this week
- Fork of https://source.codeaurora.org/quic/hexagon_nn/nnlib☆58Apr 10, 2023Updated 2 years ago
- Terraform Script for - Storage, container and data life cycle rules creation at scale☆11Jan 10, 2023Updated 3 years ago
- Code for the paper "Faster Neural Network Training with Approximate Tensor Operations"☆10Oct 23, 2021Updated 4 years ago
- TensorFlow and TVM integration☆36Apr 27, 2020Updated 5 years ago
- Python Inference Script(PyIS)☆19Aug 30, 2022Updated 3 years ago
- A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.☆363Jul 30, 2024Updated last year
- DXIL conversion to SPIR-V for D3D12 translation libraries☆219Feb 20, 2026Updated last week
- This repository contains a Vulkan Framework designed to enable developers to get up and running quickly for creating sample content and r…☆147Feb 20, 2026Updated last week
- 如何做技术演讲(how to give a talk)的slide☆22Feb 8, 2021Updated 5 years ago
- ☆95Nov 4, 2022Updated 3 years ago
- MegCC是一个运行时超轻量,高效,移植简单的深度学习模型编译器☆486Oct 23, 2024Updated last year