kunpengcompute / AvxToNeon
Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.
☆120Updated last year
Alternatives and similar repositories for AvxToNeon:
Users that are interested in AvxToNeon are comparing it to the libraries listed below
- Example code for Intel AVX / AVX2 intrinsics.☆137Updated last year
- assembler for NVIDIA FERMI. Imported from Google Code☆72Updated 10 years ago
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆89Updated last year
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- ☆51Updated 5 years ago
- This is a mirror of the official libpfm4 git repository, https://sourceforge.net/p/perfmon2/libpfm4/ci/master/tree/ with some local branc…☆58Updated 5 months ago
- ☆150Updated this week
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆117Updated 2 years ago
- ROCm - AMDGPU Compute Application Binary Interface☆41Updated 3 years ago
- The Farm-SVE package provides a header that implements the ARM C language extensions (ACLE) for the ARM Scalable Vector Extension (SVE) i…☆14Updated last year
- ☆56Updated 3 weeks ago
- Portable 128-bit SIMD intrinsics☆58Updated last year
- ☆85Updated last month
- SYCL Reference Manual☆27Updated 11 months ago
- SYCL Open Source Specification☆134Updated last week
- Collection of synchronization micro-benchmarks and traces from infrastructure applications☆41Updated 2 months ago
- Intel® Instrumentation and Tracing Technology (ITT) and Just-In-Time (JIT) API☆102Updated last month
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆81Updated 5 years ago
- ☆61Updated 3 months ago
- ROC profiler library. Profiling with perf-counters and derived metrics.☆141Updated this week
- Arm C Language Extensions (ACLE)☆103Updated 2 weeks ago
- ☆34Updated last year
- A profiler to disclose and quantify hardware features on GPUs.☆168Updated 2 years ago
- Intel® GPU Compute Samples☆106Updated last week
- Emulating DMA Engines on GPUs for Performance and Portability☆39Updated 9 years ago
- TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)☆39Updated this week
- SYCL Conformance Tests☆69Updated this week
- GPUDirect Async support for IB Verbs☆110Updated 2 years ago
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆38Updated 3 years ago
- Tools and Reference Code for Intel Optimizations (eg Large Pages)☆140Updated 6 months ago