Triple-Z / AVX-AVX2-Example-CodeLinks

Example code for Intel AVX / AVX2 intrinsics.

☆138

Alternatives and similar repositories for AVX-AVX2-Example-Code

Users that are interested in AVX-AVX2-Example-Code are comparing it to the libraries listed below

Sorting:

kunpengcompute / AvxToNeon
Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.
☆123Updated last year
flame / blislab
BLISlab: A Sandbox for Optimizing GEMM
☆531Updated 4 years ago
NVlabs / cub
THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.
☆84Updated last year
cyanguwa / nersc-roofline
☆45Updated 4 years ago
dumerrill / merge-spmv
☆93Updated 8 years ago
zjin-lcf / HeCBench
☆248Updated last month
pigirons / sgemm_hsw
This is an implementation of sgemm_kernel on L1d cache.
☆229Updated last year
kshitijl / avx2-examples
Short examples illustrating AVX2 intrinsics for simple tasks.
☆96Updated last year
yzhaiustc / Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
☆150Updated 3 years ago
PAA-NCIC / PPoPP2017_artifact
Third party assembler and GEMM library for NVIDIA Kepler GPU
☆81Updated 5 years ago
ekondis / gpumembench
A GPU benchmark suite for assessing on-chip GPU memory bandwidth
☆106Updated 7 years ago
puckbee / CVR
Parallelized and vectorized SpMV on Intel Xeon Phi (Knights Landing, AVX512, KNL)
☆24Updated last year
Apress / data-parallel-CPP
Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben A…
☆275Updated 3 months ago
pigirons / cpufp
A CPU tool for benchmarking the peak of floating points
☆554Updated last week
codyjrivera / tsm2x-imp
Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA
☆33Updated 4 years ago
weifengliu-ssslab / Benchmark_SpMV_using_CSR5
CSR5-based SpMV on CPUs, GPUs and Xeon Phi
☆105Updated last year
carlushuang / cpu_gemm_opt
how to design cpu gemm on x86 with avx256, that can beat openblas.
☆70Updated 6 years ago
mmperf / mmperf
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
☆133Updated last year
jeffhammond / dpcpp-tutorial
Intel Data Parallel C++ (and SYCL 2020) Tutorial.
☆93Updated 3 years ago
owensgroup / GpuBTree
Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019
☆57Updated 3 years ago
FdyCN / PTX-ISA
CUDA PTX-ISA Document 中文翻译版
☆44Updated last month
fsword73 / HIP-Performance-Optmization-on-VEGA64
14 basic topics for VEGA64 performance optmization
☆60Updated 4 years ago
njuhope / cuda_sgemm
☆113Updated last year
wmmae / wmma_extension
An extension library of WMMA API (Tensor Core API)
☆99Updated last year
icl-utk-edu / papi
☆163Updated last week
ndd314 / cuda_examples
☆67Updated 11 years ago
wzsh / wmma_tensorcore_sample
Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)
☆138Updated 4 years ago
SunsetQuest / CudaPAD
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.
☆119Updated 2 years ago
lixiuhong / batched_gemm
☆39Updated 5 years ago
zenny-chen / Intel-AVX512-Brief-Introduction
Intel AVX-512简介
☆50Updated last year