VictorRodriguez / AVX-SG
Advanced Vector Extensions (AVX) basic tutorial
☆37Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for AVX-SG
- The SHOC Benchmark Suite☆247Updated 2 years ago
- Tools and extensions for CUDA profiling☆63Updated 4 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆99Updated 7 years ago
- Example code for Intel AVX / AVX2 intrinsics.☆128Updated last year
- Kernel Fusion and Runtime Compilation Based on NNVM☆69Updated 8 years ago
- ParaDnn: A systematic performance analysis methodology for deep learning.☆39Updated 4 years ago
- TLB Benchmarks☆32Updated 7 years ago
- ☆90Updated 7 years ago
- A simple demonstration of how PyTorch autograd works☆16Updated 3 years ago
- Pytorch process group third-party plugin for UCC☆20Updated 7 months ago
- GPU Performance Advisor☆63Updated 2 years ago
- Provides a set of benchmarks that can be used to measure the memory bandwidth performance of CPU's☆81Updated 7 months ago
- Benchmark for measuring the performance of sparse and irregular memory access.☆75Updated this week
- Chai☆42Updated 11 months ago
- ☆47Updated 5 years ago
- Tests and benchmarks for cudnn (and in the future, other nvidia libraries)☆53Updated 4 years ago
- A tool for examining GPU scheduling behavior.☆70Updated 3 months ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆78Updated 5 years ago
- Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.☆11Updated last year
- Short examples illustrating AVX2 intrinsics for simple tasks.☆83Updated 8 months ago
- End to End steps for adding custom ops in PyTorch.☆19Updated 4 years ago
- Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite☆60Updated 6 years ago
- This is a mirror of the official libpfm4 git repository, https://sourceforge.net/p/perfmon2/libpfm4/ci/master/tree/ with some local branc…☆55Updated last month
- Code used for generating charts and measurements of nontemporal stores☆9Updated 6 years ago
- gossip: Efficient Communication Primitives for Multi-GPU Systems☆58Updated 2 years ago
- ☆41Updated 4 years ago
- GPUDirect Async support for IB Verbs☆90Updated 2 years ago
- ☆58Updated last month
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆124Updated last year
- GVProf: A Value Profiler for GPU-based Clusters☆48Updated 8 months ago