A profiler to disclose and quantify hardware features on GPUs.
☆176May 15, 2022Updated 3 years ago
Alternatives and similar repositories for ArchProbe
Users that are interested in ArchProbe are comparing it to the libraries listed below
Sorting:
- Multi-branch model for concurrent execution☆18Jun 27, 2023Updated 2 years ago
- LLM inference in C/C++☆20Oct 22, 2025Updated 4 months ago
- ☆19Feb 28, 2022Updated 4 years ago
- mperf是一个面向移动/嵌入式平台的算子性能调优工具箱☆190Aug 17, 2023Updated 2 years ago
- A micro Vulkan compute pipeline and a collection of benchmarking compute shaders☆261Mar 27, 2025Updated 11 months ago
- Get Windows System Root certificates☆17Jan 21, 2026Updated last month
- ☆10Sep 4, 2025Updated 6 months ago
- The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]☆19Aug 4, 2022Updated 3 years ago
- ☆256Sep 15, 2023Updated 2 years ago
- Tensor Tiling Library☆38Sep 23, 2025Updated 5 months ago
- A tool for patching the TensorFlow frozen protobuf file for compatibility to RKNN & SNPE SDK.☆11Feb 22, 2021Updated 5 years ago
- A utility library for application developers to sample Arm Immortalis GPU or Arm Mali GPU performance counters.☆266Mar 2, 2026Updated 2 weeks ago
- row-major matmul optimization☆707Feb 24, 2026Updated 3 weeks ago
- A tool which profiles Vulkan devices to find their peak capacities☆163Feb 27, 2026Updated 3 weeks ago
- Apple G13 GPU architecture docs and tools☆647May 16, 2025Updated 10 months ago
- MLPerf™ Mobile models☆26Nov 16, 2025Updated 4 months ago
- modified cutlass☆15Oct 26, 2020Updated 5 years ago
- Dynamic suballocators for external memory (e.g., Vulkan device memory). Umaintained - consider migrating to https://crates.io/crates/offs…☆15Jul 22, 2022Updated 3 years ago
- A tool which profiles OpenCL devices to find their peak capacities☆483Mar 10, 2026Updated last week
- ☆25Feb 20, 2024Updated 2 years ago
- A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.☆362Jul 30, 2024Updated last year
- Parsers for CUDA binary files☆24Dec 29, 2023Updated 2 years ago
- lossy GPU-friendly image compression\decompression☆13Dec 5, 2021Updated 4 years ago
- MFCStoreClient is an example of how to access Windows Store APIs from a C++ MFC app.☆20Sep 1, 2022Updated 3 years ago
- A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.☆1,003Sep 19, 2024Updated last year
- Manually implemented quantization-aware training☆23Oct 12, 2022Updated 3 years ago
- FLOꟼ - An MIT-licensed image viewer equipped with a GPU-accelerated perceptual image diffing algorithm based on ꟻLIP☆68Jun 12, 2022Updated 3 years ago
- ☆12Mar 1, 2024Updated 2 years ago
- 阿里云第二届数据库大赛新手门槛队(季军)解决方案☆10Apr 19, 2021Updated 4 years ago
- Apple GPU microarchitecture☆581Sep 22, 2024Updated last year
- A stub opecl library that dynamically dlopen/dlsyms opencl implementations at runtime based on environment variables. Will be useful when…☆74Mar 4, 2024Updated 2 years ago
- MegEngine到其他框架的转换器☆69Apr 27, 2023Updated 2 years ago
- Yinghan's Code Sample☆364Jul 25, 2022Updated 3 years ago
- Fork of https://source.codeaurora.org/quic/hexagon_nn/nnlib☆59Apr 10, 2023Updated 2 years ago
- Fundamental Sources for Water Wave Animation☆20Dec 8, 2022Updated 3 years ago
- MobiSys#114☆23Aug 17, 2023Updated 2 years ago
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.☆752Aug 6, 2025Updated 7 months ago
- ☆63Dec 5, 2021Updated 4 years ago
- ncnn和pnnx格式编辑器☆137Oct 7, 2024Updated last year