powderluv / mm_benchmarks
☆12Updated 4 years ago
Alternatives and similar repositories for mm_benchmarks:
Users that are interested in mm_benchmarks are comparing it to the libraries listed below
- assembler for NVIDIA FERMI. Imported from Google Code☆72Updated 10 years ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆130Updated last year
- ☆23Updated 3 weeks ago
- Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.☆120Updated last year
- Emulating DMA Engines on GPUs for Performance and Portability☆39Updated 9 years ago
- Benchmarks for auto-vectorization and revectorization, including both hand-vectorized and scalar code☆28Updated 6 years ago
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆38Updated 3 years ago
- Conversions to MLIR EmitC☆128Updated 4 months ago
- ☆56Updated last month
- A fast and accurate reuse distance analyzer for multi-threaded applications. It leverages existing hardware features in commodity CPUs.☆17Updated 2 years ago
- CUPTI GPU Profiler☆37Updated 6 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last month
- ☆44Updated 4 years ago
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆89Updated last year
- CUDAAdvisor: a GPU profiling tool☆49Updated 6 years ago
- ROCm - AMDGPU Compute Application Binary Interface☆41Updated 3 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- CuPBoP-AMD is a CUDA translator that translates CUDA programs at NVVM IR level to HIP-compatible IR that can run on AMD GPUs.☆36Updated last year
- CNNs in Halide☆23Updated 9 years ago
- ☆51Updated 5 years ago
- The Farm-SVE package provides a header that implements the ARM C language extensions (ACLE) for the ARM Scalable Vector Extension (SVE) i…☆14Updated last year
- Intel® Instrumentation and Tracing Technology (ITT) and Just-In-Time (JIT) API☆107Updated this week
- ☆150Updated 2 weeks ago
- BLIS fork with kernels for Apple M1. (Perhaps) The first open-source BLAS with Apple Matrix Coprocessor support.☆34Updated 2 years ago
- Instruction latency & throughput profiler for AArch64☆34Updated last year
- ☆28Updated 2 years ago
- Tutorials for ARM SVE on Docker☆43Updated 2 years ago
- CERE: Codelet Extractor and REplayer☆40Updated last year
- A profiler to disclose and quantify hardware features on GPUs.☆168Updated 2 years ago
- Microbenchmarks for Aarch64 (Cortex A53)☆12Updated 2 years ago