kshitijl / avx2-examplesLinks
Short examples illustrating AVX2 intrinsics for simple tasks.
☆98Updated last year
Alternatives and similar repositories for avx2-examples
Users that are interested in avx2-examples are comparing it to the libraries listed below
Sorting:
- Example code for Intel AVX / AVX2 intrinsics.☆142Updated 2 years ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆138Updated 2 years ago
- Kernel Tuning Toolkit☆65Updated last month
- Benchmark for measuring the performance of sparse and irregular memory access.☆82Updated 3 months ago
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆260Updated 11 months ago
- ☆189Updated last week
- GPUOcelot: A dynamic compilation framework for PTX☆218Updated 10 months ago
- Test the non-AVX, AVX2 and AVX-512 speeds across various active core counts☆229Updated last year
- tools to create performance and roofline plots from measured data☆60Updated 11 years ago
- Utilities to measure read access times of caches, memory, and hardware prefetches for simple and fused operations☆85Updated 2 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆124Updated last week
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆165Updated this week
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆125Updated 2 years ago
- A GPU accelerated error-bounded lossy compression for scientific data.☆92Updated last month
- An implementation of HIP that works on CPUs, across OSes.☆130Updated last year
- SYCL Open Source Specification☆139Updated last month
- ☆136Updated 2 years ago
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆145Updated last week
- Full-speed Array of Structures access☆176Updated 2 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆256Updated last week
- Little OpenMP Library☆169Updated 3 years ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆82Updated 6 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆198Updated last week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆56Updated 8 months ago
- Haystack is an analytical cache model that given a program computes the number of cache misses.☆46Updated 6 years ago
- Pluto: An automatic polyhedral parallelizer and locality optimizer☆311Updated 3 months ago
- GPU-Accelerated Lossless Data Compressors Survey☆121Updated 5 years ago
- Conversions to MLIR EmitC☆134Updated last year
- SYCL Benchmark Suite☆66Updated 5 months ago
- Agenium Scale vectorization library for CPUs and GPUs☆334Updated 4 years ago