kshitijl / avx2-examplesLinks
Short examples illustrating AVX2 intrinsics for simple tasks.
☆95Updated last year
Alternatives and similar repositories for avx2-examples
Users that are interested in avx2-examples are comparing it to the libraries listed below
Sorting:
- Example code for Intel AVX / AVX2 intrinsics.☆138Updated last year
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆133Updated last year
- Omnitrace: Application Profiling, Tracing, and Analysis☆318Updated this week
- SYCL Open Source Specification☆136Updated last week
- Benchmark for measuring the performance of sparse and irregular memory access.☆78Updated last month
- Haystack is an analytical cache model that given a program computes the number of cache misses.☆46Updated 5 years ago
- Archived implementation of BLAS using the SYCL open standard. See oneMath for a replacement.☆261Updated 5 months ago
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆119Updated 2 years ago
- Advanced Profiling and Analytics for AMD Hardware☆157Updated this week
- ☆156Updated last week
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆81Updated 5 years ago
- Kernel Tuning Toolkit☆60Updated last month
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆93Updated 3 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆105Updated 7 years ago
- development repository for the open earth compiler☆80Updated 4 years ago
- TLB Benchmarks☆34Updated 7 years ago
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆134Updated last week
- An implementation of HIP that works on CPUs, across OSes.☆121Updated last year
- Agenium Scale vectorization library for CPUs and GPUs☆333Updated 3 years ago
- A 128 bit unsigned integer class for CUDA☆46Updated 5 months ago
- A repository where GPU applications are aggregated using a common build flow that supports multiple CUDA versions.☆66Updated last month
- assembler for NVIDIA FERMI. Imported from Google Code☆72Updated 10 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆119Updated last week
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆32Updated 4 years ago
- ☆62Updated 6 months ago
- portDNN is a library implementing neural network algorithms written using SYCL☆113Updated last year
- rocWMMA☆115Updated last week
- Advanced Vector Extensions (AVX) basic tutorial☆37Updated 4 years ago
- Demonstration of various hardware effects on CUDA GPUs.☆382Updated last year
- ☆44Updated 4 years ago