kshitijl / avx2-examplesLinks
Short examples illustrating AVX2 intrinsics for simple tasks.
☆98Updated last year
Alternatives and similar repositories for avx2-examples
Users that are interested in avx2-examples are comparing it to the libraries listed below
Sorting:
- Example code for Intel AVX / AVX2 intrinsics.☆144Updated 2 years ago
- ☆197Updated this week
- Kernel Tuning Toolkit☆65Updated last month
- CGBN: CUDA Accelerated Multiple Precision Arithmetic (Big Num) using Cooperative Groups☆235Updated 10 months ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆138Updated 2 years ago
- Little OpenMP Library☆169Updated 3 years ago
- A GPU accelerated error-bounded lossy compression for scientific data.☆94Updated last week
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆260Updated 11 months ago
- tools to create performance and roofline plots from measured data☆60Updated 11 years ago
- Agenium Scale vectorization library for CPUs and GPUs☆337Updated 4 years ago
- Test the non-AVX, AVX2 and AVX-512 speeds across various active core counts☆230Updated last year
- SIMD Library for Evaluating Elementary Functions, vectorized libm and DFT☆797Updated 2 weeks ago
- A 128 bit unsigned integer class for CUDA☆46Updated last year
- Conversion to/from half-precision floating point formats☆379Updated 4 months ago
- 🎃 GPU load-balancing library for regular and irregular computations.☆64Updated 3 months ago
- ☆137Updated 2 years ago
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆126Updated 2 years ago
- Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben A…☆281Updated 9 months ago
- Benchmark for measuring the performance of sparse and irregular memory access.☆82Updated 4 months ago
- Demonstration of various hardware effects on CUDA GPUs.☆390Updated 2 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆109Updated 8 years ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆85Updated 6 years ago
- This is a set of simple programs that can be used to explore the features of a parallel platform.☆470Updated 4 months ago
- ☆48Updated 5 years ago
- A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).☆568Updated 3 months ago
- Modular C++ Toolkit for Performance Analysis and Logging. Profiling API and Tools for C, C++, CUDA, Fortran, and Python. The C++ template…☆366Updated last year
- ☆270Updated this week
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆212Updated last month
- SYCL Open Source Specification☆143Updated 2 weeks ago
- Monorepo for the OpenCilk compiler. Forked from llvm/llvm-project and based on Tapir/LLVM.☆119Updated this week