kshitijl / avx2-examplesLinks
Short examples illustrating AVX2 intrinsics for simple tasks.
☆95Updated last year
Alternatives and similar repositories for avx2-examples
Users that are interested in avx2-examples are comparing it to the libraries listed below
Sorting:
- Example code for Intel AVX / AVX2 intrinsics.☆138Updated last year
- Agenium Scale vectorization library for CPUs and GPUs☆333Updated 3 years ago
- ☆150Updated last week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆131Updated last year
- Benchmark for measuring the performance of sparse and irregular memory access.☆78Updated last month
- ☆44Updated 4 years ago
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆70Updated 9 years ago
- Kernel Tuning Toolkit☆59Updated 3 weeks ago
- Measure instruction latency and throughput☆24Updated 3 months ago
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆89Updated last year
- Source code for the CPU-Free model - a fully autonomous execution model for multi-GPU applications that completely excludes the involveme…☆17Updated last year
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆52Updated 2 months ago
- tools to create performance and roofline plots from measured data☆58Updated 10 years ago
- ☆52Updated 5 years ago
- The Combinatorial BLAS (CombBLAS) is an extensible distributed-memory parallel graph library offering a small but powerful set of linear …☆75Updated last week
- CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.☆119Updated 2 years ago
- ☆134Updated 2 years ago
- A GPU accelerated error-bounded lossy compression for scientific data.☆75Updated last week
- The Berkeley Container Library☆124Updated last year
- An implementation of HIP that works on CPUs, across OSes.☆120Updated last year
- Storage for my snippets, toy programs, etc.☆357Updated 2 months ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆105Updated 7 years ago
- Omnitrace: Application Profiling, Tracing, and Analysis☆312Updated last week
- Little OpenMP Library☆161Updated 2 years ago
- BLAS implementation for Intel FPGA☆78Updated 4 years ago
- AVX-optimized sin(), cos(), exp() and log() functions☆124Updated 3 years ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆81Updated 5 years ago
- TLB Benchmarks☆34Updated 7 years ago
- Haystack is an analytical cache model that given a program computes the number of cache misses.☆46Updated 5 years ago
- GPU-Accelerated Lossless Data Compressors Survey☆115Updated 4 years ago