kunpengcompute / AvxToNeon
Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.
☆113Updated 8 months ago
Related projects: ⓘ
- Example code for Intel AVX / AVX2 intrinsics.☆123Updated last year
- assembler for NVIDIA FERMI. Imported from Google Code☆68Updated 9 years ago
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆107Updated last year
- Intel® Instrumentation and Tracing Technology (ITT) and Just-In-Time (JIT) API☆85Updated last week
- GPU-Accelerated Lossless Data Compressors Survey☆110Updated 4 years ago
- This is a mirror of the official libpfm4 git repository, https://sourceforge.net/p/perfmon2/libpfm4/ci/master/tree/ with some local branc…☆54Updated last month
- ☆145Updated 3 weeks ago
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆77Updated 5 months ago
- A collection of performance analysis tools, recipes, handy scripts, microbenchmarks & more☆107Updated this week
- A profiler to disclose and quantify hardware features on GPUs.☆158Updated 2 years ago
- ☆52Updated last week
- SYCL Reference Manual☆25Updated 4 months ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆96Updated 7 years ago
- A tool for examining GPU scheduling behavior.☆67Updated last month
- Intel® Data Mover Library (Intel® DML)☆83Updated this week
- Tools and Reference Code for Intel Optimizations (eg Large Pages)☆130Updated 3 weeks ago
- Benchmarks for auto-vectorization and revectorization, including both hand-vectorized and scalar code☆24Updated 5 years ago
- Collection of synchronization micro-benchmarks and traces from infrastructure applications☆38Updated 4 months ago
- A lightweight memory allocator for hardware-accelerated machine learning☆114Updated last month
- ROB size testing utility☆128Updated 2 years ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆123Updated 11 months ago
- Intel® Query Processing Library (Intel® QPL)☆96Updated last week
- Portable 128-bit SIMD intrinsics☆55Updated last year
- Test the non-AVX, AVX2 and AVX-512 speeds across various active core counts☆183Updated 7 months ago
- chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.☆185Updated this week
- ☆131Updated 3 weeks ago
- ☆53Updated last week
- Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.☆66Updated last year
- SYCL Conformance Tests☆60Updated last week
- ☆225Updated this week