kshitijl / avx2-examplesLinks
Short examples illustrating AVX2 intrinsics for simple tasks.
☆96Updated last year
Alternatives and similar repositories for avx2-examples
Users that are interested in avx2-examples are comparing it to the libraries listed below
Sorting:
- Example code for Intel AVX / AVX2 intrinsics.☆138Updated last year
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆133Updated last year
- ☆163Updated last week
- Benchmark for measuring the performance of sparse and irregular memory access.☆78Updated 2 months ago
- Demonstration of various hardware effects on CUDA GPUs.☆383Updated last year
- Agenium Scale vectorization library for CPUs and GPUs☆333Updated 3 years ago
- tools to create performance and roofline plots from measured data☆59Updated 11 years ago
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆261Updated 6 months ago
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆119Updated 2 years ago
- A GPU accelerated error-bounded lossy compression for scientific data.☆86Updated last month
- Conversion to/from half-precision floating point formats☆357Updated 11 months ago
- Modular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template…☆361Updated 11 months ago
- ☆134Updated 2 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆106Updated 7 years ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- Test the non-AVX, AVX2 and AVX-512 speeds across various active core counts☆216Updated 8 months ago
- An implementation of HIP that works on CPUs, across OSes.☆121Updated last year
- ☆16Updated 3 years ago
- Next generation FFT implementation for ROCm☆195Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆119Updated this week
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆137Updated this week
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆81Updated 5 years ago
- GPUOcelot: A dynamic compilation framework for PTX☆201Updated 5 months ago
- ☆45Updated 4 years ago
- TPP experimentation on MLIR for linear algebra☆132Updated last week
- Kernel Tuning Toolkit☆61Updated 3 weeks ago
- 🎃 GPU load-balancing library for regular and irregular computations.☆62Updated last year
- CUDA kernel author's tools☆111Updated 3 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆172Updated last week
- Full-speed Array of Structures access☆171Updated 2 years ago