kunpengcompute / AvxToNeonLinks
Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.
☆121Updated last year
Alternatives and similar repositories for AvxToNeon
Users that are interested in AvxToNeon are comparing it to the libraries listed below
Sorting:
- Intel® Instrumentation and Tracing Technology (ITT) and Just-In-Time (JIT) APIs☆115Updated 3 weeks ago
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆90Updated last year
- Arm C Language Extensions (ACLE)☆109Updated 2 weeks ago
- assembler for NVIDIA FERMI. Imported from Google Code☆72Updated 10 years ago
- ☆58Updated 2 weeks ago
- This is a mirror of the official libpfm4 git repository, https://sourceforge.net/p/perfmon2/libpfm4/ci/master/tree/ with some local branc…☆64Updated 8 months ago
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆119Updated 2 years ago
- Example code for Intel AVX / AVX2 intrinsics.☆138Updated last year
- ☆150Updated this week
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆40Updated 3 years ago
- code for benchmarking GPU performance based on cublasSgemm and cublasHgemm☆31Updated 3 years ago
- Fast AVX512 (AVX-512) quicksort + bitonic sort.☆28Updated 3 years ago
- ☆52Updated 5 years ago
- Emulating DMA Engines on GPUs for Performance and Portability☆40Updated 10 years ago
- A tool for examining GPU scheduling behavior.☆84Updated 10 months ago
- PROGRESS64 is a C library of scalable functions for concurrent programs, primarily focused on networking applications.☆92Updated 2 months ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆105Updated 7 years ago
- Intel® Data Mover Library (Intel® DML)☆95Updated 2 months ago
- Portable 128-bit SIMD intrinsics☆58Updated last year
- Tools and Reference Code for Intel Optimizations (eg Large Pages)☆143Updated 9 months ago
- immintrin_dbg.h is an include file, a wrapper around immintrin.h. It implements most of AVX, AVX2, AVX-512 vector intrinsics to enable so…☆56Updated 2 years ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆133Updated last year
- Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysi…☆227Updated 3 weeks ago
- A profiler to disclose and quantify hardware features on GPUs.☆171Updated 3 years ago
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆83Updated this week
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆81Updated 5 years ago
- SYCL Reference Manual☆28Updated last year
- Collection of synchronization micro-benchmarks and traces from infrastructure applications☆44Updated last week
- Intel AVX-512简介☆49Updated last year
- Intel® GPU Compute Samples☆108Updated last month