kunpengcompute / AvxToNeonLinks
Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.
☆126Updated last year
Alternatives and similar repositories for AvxToNeon
Users that are interested in AvxToNeon are comparing it to the libraries listed below
Sorting:
- This is a mirror of the official libpfm4 git repository, https://sourceforge.net/p/perfmon2/libpfm4/ci/master/tree/ with some local branc…☆66Updated last year
- Example code for Intel AVX / AVX2 intrinsics.☆142Updated 2 years ago
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆91Updated last year
- ☆58Updated this week
- Test the non-AVX, AVX2 and AVX-512 speeds across various active core counts☆227Updated 11 months ago
- ☆154Updated last week
- GPU-Accelerated Lossless Data Compressors Survey☆120Updated 5 years ago
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆124Updated 2 years ago
- A profiler to disclose and quantify hardware features on GPUs.☆174Updated 3 years ago
- Intel® GPU Compute Samples☆109Updated last month
- Intel® Instrumentation and Tracing Technology (ITT) and Just-In-Time (JIT) APIs☆122Updated 2 weeks ago
- ☆143Updated 2 weeks ago
- Arm C Language Extensions (ACLE)☆115Updated this week
- Utilities to measure read access times of caches, memory, and hardware prefetches for simple and fused operations☆85Updated 2 years ago
- Fast AVX512 (AVX-512) quicksort + bitonic sort.☆28Updated 3 years ago
- Intel® Data Mover Library (Intel® DML)☆93Updated 6 months ago
- ROB size testing utility☆158Updated 3 years ago
- PROGRESS64 is a C library of scalable functions for concurrent programs, primarily focused on networking applications.☆93Updated last month
- Conversion to/from half-precision floating point formats☆372Updated 2 months ago
- Collection of synchronization micro-benchmarks and traces from infrastructure applications☆48Updated 2 months ago
- Test if AVX vector loads and stores are atomic☆33Updated 5 years ago
- A lightweight memory allocator for hardware-accelerated machine learning☆170Updated 3 weeks ago
- SYCL Reference Manual☆28Updated last year
- immintrin_dbg.h is an include file, a wrapper around immintrin.h. It implements most of AVX, AVX2, AVX-512 vector intrinsics to enable so…☆58Updated 2 years ago
- A collection of performance analysis tools, recipes, handy scripts, microbenchmarks & more☆141Updated 4 months ago
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆43Updated 4 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆108Updated 8 years ago
- CUPTI GPU Profiler☆40Updated 6 years ago
- Tools and Reference Code for Intel Optimizations (eg Large Pages)☆146Updated last month
- AVX512F and AVX2 versions of quick sort☆104Updated 7 years ago