kshitijl / avx2-examples
Short examples illustrating AVX2 intrinsics for simple tasks.
☆87Updated last year
Alternatives and similar repositories for avx2-examples:
Users that are interested in avx2-examples are comparing it to the libraries listed below
- Example code for Intel AVX / AVX2 intrinsics.☆136Updated last year
- Agenium Scale vectorization library for CPUs and GPUs☆330Updated 3 years ago
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆115Updated 2 years ago
- Full-speed Array of Structures access☆164Updated last year
- Benchmark for measuring the performance of sparse and irregular memory access.☆77Updated last month
- A 128 bit unsigned integer class for CUDA☆43Updated 2 months ago
- TLB Benchmarks☆33Updated 7 years ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆128Updated last year
- Omnitrace: Application Profiling, Tracing, and Analysis☆309Updated last week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last year
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆63Updated this week
- Advanced Vector Extensions (AVX) basic tutorial☆37Updated 3 years ago
- The Berkeley Container Library☆124Updated last year
- ☆43Updated 4 years ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆80Updated 5 years ago
- Utilities to measure read access times of caches, memory, and hardware prefetches for simple and fused operations☆82Updated last year
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆105Updated 7 years ago
- Massively Parallel Huffman Decoding on GPUs☆47Updated 6 years ago
- Little OpenMP Library☆157Updated 2 years ago
- Header-only C++ library for low precision floating point type emulation.☆169Updated 5 years ago
- An implementation of HIP that works on CPUs, across OSes.☆115Updated 11 months ago
- ☆56Updated last week
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆262Updated 2 months ago
- Kernel Tuning Toolkit☆59Updated last month
- CUDA kernel author's tools☆110Updated 2 years ago
- A task benchmark☆41Updated 7 months ago
- tools to create performance and roofline plots from measured data☆58Updated 10 years ago
- assembler for NVIDIA FERMI. Imported from Google Code☆72Updated 9 years ago
- ☆16Updated 2 years ago
- ☆68Updated 4 years ago