kshitijl / avx2-examplesLinks
Short examples illustrating AVX2 intrinsics for simple tasks.
☆98Updated last year
Alternatives and similar repositories for avx2-examples
Users that are interested in avx2-examples are comparing it to the libraries listed below
Sorting:
- ☆185Updated last week
- Example code for Intel AVX / AVX2 intrinsics.☆142Updated 2 years ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆135Updated 2 years ago
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆260Updated 10 months ago
- Benchmark for measuring the performance of sparse and irregular memory access.☆80Updated 3 months ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆109Updated 8 years ago
- Kernel Tuning Toolkit☆65Updated last week
- tools to create performance and roofline plots from measured data☆60Updated 11 years ago
- GPUOcelot: A dynamic compilation framework for PTX☆215Updated 9 months ago
- Agenium Scale vectorization library for CPUs and GPUs☆334Updated 4 years ago
- A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)☆427Updated 10 months ago
- Little OpenMP Library☆168Updated 3 years ago
- SYCL Open Source Specification☆139Updated 2 weeks ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆124Updated last week
- ☆267Updated last week
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆124Updated 2 years ago
- Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben A…☆282Updated 7 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆255Updated last week
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆91Updated last year
- Demonstration of various hardware effects on CUDA GPUs.☆389Updated 2 years ago
- Online CUDA Occupancy Calculator☆80Updated 4 years ago
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆165Updated this week
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆145Updated last week
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆160Updated 4 months ago
- SYCL Benchmark Suite☆65Updated 5 months ago
- ☆288Updated 2 months ago
- TPP experimentation on MLIR for linear algebra☆139Updated this week
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆95Updated 3 years ago
- ☆62Updated 11 months ago
- A GPU accelerated error-bounded lossy compression for scientific data.☆91Updated last week