kunpengcompute / AvxToNeonLinks
Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.
☆123Updated last year
Alternatives and similar repositories for AvxToNeon
Users that are interested in AvxToNeon are comparing it to the libraries listed below
Sorting:
- This is a mirror of the official libpfm4 git repository, https://sourceforge.net/p/perfmon2/libpfm4/ci/master/tree/ with some local branc…☆65Updated 9 months ago
- Example code for Intel AVX / AVX2 intrinsics.☆139Updated last year
- GPU-Accelerated Lossless Data Compressors Survey☆117Updated 4 years ago
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆91Updated last year
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆119Updated 2 years ago
- ☆58Updated last week
- Test the non-AVX, AVX2 and AVX-512 speeds across various active core counts☆220Updated 9 months ago
- A profiler to disclose and quantify hardware features on GPUs.☆173Updated 3 years ago
- Intel® Instrumentation and Tracing Technology (ITT) and Just-In-Time (JIT) APIs☆118Updated last month
- Arm C Language Extensions (ACLE)☆110Updated last month
- ☆151Updated last week
- Intel® Data Mover Library (Intel® DML)☆94Updated 4 months ago
- PROGRESS64 is a C library of scalable functions for concurrent programs, primarily focused on networking applications.☆92Updated 3 months ago
- assembler for NVIDIA FERMI. Imported from Google Code☆72Updated 10 years ago
- Utilities to measure read access times of caches, memory, and hardware prefetches for simple and fused operations☆84Updated last year
- Fast AVX512 (AVX-512) quicksort + bitonic sort.☆28Updated 3 years ago
- Intel® Query Processing Library (Intel® QPL)☆104Updated 3 weeks ago
- Tools and Reference Code for Intel Optimizations (eg Large Pages)☆145Updated 10 months ago
- GPU B-Tree with support for versioning (snapshots).☆49Updated 9 months ago
- Collection of synchronization micro-benchmarks and traces from infrastructure applications☆45Updated last month
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆40Updated 3 years ago
- Conversion to/from half-precision floating point formats☆362Updated last year
- Bitonic sort using simd (avx/neon) instructions☆14Updated 3 years ago
- ARMv8 performance monitor from userspace☆78Updated 2 years ago
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆35Updated 9 years ago
- Simple benchmark for memory throughput and latency☆386Updated 2 years ago
- A lightweight memory allocator for hardware-accelerated machine learning☆155Updated 4 months ago
- ☆37Updated last year
- The platform independent header allowing to compile any C/C++ code containing ARM NEON intrinsic functions for x86 target systems using S…☆469Updated 2 months ago
- ☆59Updated 2 years ago