VictorRodriguez / AVX-SG
Advanced Vector Extensions (AVX) basic tutorial
☆37Updated 3 years ago
Alternatives and similar repositories for AVX-SG:
Users that are interested in AVX-SG are comparing it to the libraries listed below
- Example code for Intel AVX / AVX2 intrinsics.☆134Updated last year
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- Power measurement for CUDA programs by polling using NVIDIA Management Library (nvml) APIs.☆24Updated 7 years ago
- Short examples illustrating AVX2 intrinsics for simple tasks.☆87Updated 11 months ago
- Parallelized and vectorized SpMV on Intel Xeon Phi (Knights Landing, AVX512, KNL)☆24Updated last year
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆128Updated last year
- A tool for examining GPU scheduling behavior.☆71Updated 6 months ago
- TLB Benchmarks☆33Updated 7 years ago
- Statistics on GPUs☆29Updated 5 months ago
- GPUDirect Async support for IB Verbs☆104Updated 2 years ago
- Kernel Fusion and Runtime Compilation Based on NNVM☆70Updated 8 years ago
- A Sound and Complete Verification Tool for Warp-Specialized GPU Kernels☆18Updated 9 years ago
- Chai☆42Updated last year
- gossip: Efficient Communication Primitives for Multi-GPU Systems☆58Updated 2 years ago
- Pointer-chasing memory benchmark (forked from Doug Pase's code).☆59Updated 11 years ago
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆84Updated 10 months ago
- A Deep Learning Meta-Framework and HPC Benchmarking Library☆81Updated 2 years ago
- Test bench and scripts for testing VCL☆10Updated last year
- A 128 bit unsigned integer class for CUDA☆43Updated 2 months ago
- Artifact Evaluation Reproduction for "Software Prefetching for Indirect Memory Accesses", CGO 2017, using CK.☆38Updated 3 years ago
- Flexible GPGPU instrumentation☆86Updated 5 years ago
- ☆42Updated 4 years ago
- Multiple 1-stencil implementations using nvidia cuda.☆13Updated 7 years ago
- ☆51Updated 5 years ago
- 🎃 GPU load-balancing library for regular and irregular computations.☆62Updated 8 months ago
- assembler for NVIDIA FERMI. Imported from Google Code☆72Updated 9 years ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆80Updated 5 years ago
- HeteroSync is a benchmark suite for performing fine-grained synchronization on tightly coupled GPUs☆28Updated 5 months ago
- A framework for pipelined computing on GPU☆29Updated 5 years ago
- Tests and benchmarks for cudnn (and in the future, other nvidia libraries)☆53Updated 4 years ago