nihui / vkpeak
A tool which profiles Vulkan devices to find their peak capacities
☆119Updated 7 months ago
Alternatives and similar repositories for vkpeak:
Users that are interested in vkpeak are comparing it to the libraries listed below
- Detect CPU features with single-file☆389Updated last week
- A micro Vulkan compute pipeline and a collection of benchmarking compute shaders☆237Updated last month
- Benchmark your NCNN models on 3DS(or crash)☆10Updated last year
- ☆18Updated 4 years ago
- ☆14Updated last month
- Handy tools & graphics API abstraction for blazing fast prototyping☆9Updated last year
- prebuild package for cross compiling riscv☆18Updated 3 years ago
- A small OpenCL benchmark program to measure peak GPU/CPU performance.☆202Updated 3 weeks ago
- ☆40Updated 2 years ago
- A converter for llama2.c legacy models to ncnn models.☆87Updated last year
- BLIS fork with kernels for Apple M1. (Perhaps) The first open-source BLAS with Apple Matrix Coprocessor support.☆34Updated 2 years ago
- chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.☆269Updated this week
- Call ncnn from Fortran☆14Updated 2 years ago
- A tool which profiles OpenCL devices to find their peak capacities☆441Updated 4 months ago
- A profiler to disclose and quantify hardware features on GPUs.☆168Updated 2 years ago
- ☆131Updated this week
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆38Updated 3 years ago
- Infere RWKV on NCNN☆48Updated 8 months ago
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆397Updated 3 months ago
- Implementation of OpenCL 3.0 on Vulkan☆389Updated 2 weeks ago
- Stretching GPU performance for GEMMs and tensor contractions.☆237Updated this week
- rocDecode is a high performance video decode SDK for AMD hardware☆24Updated this week
- AMD's graph optimization engine.☆216Updated this week
- mperf是一个面向移动/嵌入式平台的算子性能调优工具箱☆183Updated last year
- ROCm's Thunk Interface☆90Updated last month
- Because RKNPU only knows 4D☆33Updated last year
- Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.☆121Updated last year
- The OpenCL Conformance Tests☆202Updated this week
- ☆140Updated 3 months ago
- HIPIFY: Convert CUDA to Portable C++ Code☆574Updated this week