kunpengcompute / AvxToNeon
Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.
☆117Updated last year
Alternatives and similar repositories for AvxToNeon:
Users that are interested in AvxToNeon are comparing it to the libraries listed below
- Example code for Intel AVX / AVX2 intrinsics.☆128Updated last year
- assembler for NVIDIA FERMI. Imported from Google Code☆71Updated 9 years ago
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆109Updated 2 years ago
- This is a mirror of the official libpfm4 git repository, https://sourceforge.net/p/perfmon2/libpfm4/ci/master/tree/ with some local branc…☆56Updated 2 months ago
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆82Updated 9 months ago
- ☆147Updated 3 weeks ago
- Arm C Language Extensions (ACLE)☆95Updated this week
- ☆56Updated 2 weeks ago
- Magnum IO community repo☆81Updated this week
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆78Updated this week
- Tools and Reference Code for Intel Optimizations (eg Large Pages)☆139Updated 3 months ago
- SYCL Reference Manual☆27Updated 8 months ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- GPU-Accelerated Lossless Data Compressors Survey☆112Updated 4 years ago
- AMD ROCm Performance Primitives (RPP) library is a comprehensive high-performance computer vision library for AMD processors with HIP/Ope…☆57Updated this week
- A tool for examining GPU scheduling behavior.☆71Updated 5 months ago
- Collection of synchronization micro-benchmarks and traces from infrastructure applications☆40Updated 7 months ago
- Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.☆72Updated last year
- ROC profiler library. Profiling with perf-counters and derived metrics.☆133Updated this week
- ☆51Updated 5 years ago
- Intel® GPU Compute Samples☆100Updated last month
- GPUDirect Async support for IB Verbs☆92Updated 2 years ago
- TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)☆38Updated this week
- Intel® Instrumentation and Tracing Technology (ITT) and Just-In-Time (JIT) API☆93Updated this week
- Intel AVX-512简介☆41Updated last year
- The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github…☆32Updated last month
- Stretching GPU performance for GEMMs and tensor contractions.☆231Updated this week
- oneAPI Collective Communications Library (oneCCL)☆218Updated this week
- Portable 128-bit SIMD intrinsics☆57Updated last year
- SYCL Open Source Specification☆122Updated this week