kunpengcompute / AvxToNeon
Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.
☆121Updated last year
Alternatives and similar repositories for AvxToNeon:
Users that are interested in AvxToNeon are comparing it to the libraries listed below
- assembler for NVIDIA FERMI. Imported from Google Code☆72Updated 10 years ago
- This is a mirror of the official libpfm4 git repository, https://sourceforge.net/p/perfmon2/libpfm4/ci/master/tree/ with some local branc…☆60Updated 6 months ago
- Example code for Intel AVX / AVX2 intrinsics.☆137Updated last year
- ☆151Updated 3 weeks ago
- ☆56Updated last month
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆38Updated 3 years ago
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆89Updated last year
- Portable 128-bit SIMD intrinsics☆58Updated last year
- A profiler to disclose and quantify hardware features on GPUs.☆168Updated 2 years ago
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆118Updated 2 years ago
- SYCL Reference Manual☆27Updated last year
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆130Updated last year
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- GPU B-Tree with support for versioning (snapshots).☆47Updated 6 months ago
- Intel® Instrumentation and Tracing Technology (ITT) and Just-In-Time (JIT) API☆109Updated last week
- Emulating DMA Engines on GPUs for Performance and Portability☆39Updated 9 years ago
- Arm C Language Extensions (ACLE)☆104Updated this week
- SYCL Open Source Specification☆134Updated this week
- The Farm-SVE package provides a header that implements the ARM C language extensions (ACLE) for the ARM Scalable Vector Extension (SVE) i…☆14Updated last year
- Collection of synchronization micro-benchmarks and traces from infrastructure applications☆41Updated 3 months ago
- ☆201Updated last month
- Intel® GPU Compute Samples☆106Updated last month
- ☆51Updated 5 years ago
- GPUDirect Async support for IB Verbs☆112Updated 2 years ago
- GPU-Accelerated Lossless Data Compressors Survey☆115Updated 4 years ago
- Conversion to/from half-precision floating point formats☆349Updated 9 months ago
- Intel AVX-512简介☆47Updated last year
- SYCL Conformance Tests☆70Updated this week
- mperf是一个面向移动/嵌入式平台的算子性能调优工具箱☆182Updated last year
- A lightweight memory allocator for hardware-accelerated machine learning☆148Updated last month