kshitijl / avx2-examples
Short examples illustrating AVX2 intrinsics for simple tasks.
☆82Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for avx2-examples
- Example code for Intel AVX / AVX2 intrinsics.☆125Updated last year
- Benchmark for measuring the performance of sparse and irregular memory access.☆75Updated 2 weeks ago
- A 128 bit unsigned integer class for CUDA☆43Updated 2 years ago
- Massively Parallel Huffman Decoding on GPUs☆42Updated 5 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆99Updated 7 years ago
- Little OpenMP Library☆155Updated 2 years ago
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆99Updated this week
- SYCL Open Source Specification☆114Updated this week
- An implementation of HIP that works on CPUs, across OSes.☆112Updated 7 months ago
- Kernel Tuning Toolkit☆55Updated last week
- The Berkeley Container Library☆120Updated last year
- tools to create performance and roofline plots from measured data☆58Updated 10 years ago
- GPU-Accelerated Lossless Data Compressors Survey☆110Updated 4 years ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆83Updated 8 months ago
- TLB Benchmarks☆32Updated 7 years ago
- High-level C++ for Accelerator Clusters☆142Updated this week
- ☆41Updated 4 years ago
- Concurrent CPU-GPU Programming using Task Models☆100Updated 4 years ago
- Header-only C++ library for low precision floating point type emulation.☆163Updated 4 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆42Updated 10 months ago
- ☆31Updated 3 years ago
- Conversion to/from half-precision floating point formats☆330Updated 3 months ago
- SYCL Benchmark Suite☆56Updated 2 months ago
- Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.☆114Updated 9 months ago
- ☆132Updated last year
- ☆16Updated 3 years ago
- CUDA kernel author's tools☆107Updated 2 years ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆77Updated 5 years ago
- An implementation of BLAS using the SYCL open standard.☆259Updated last week
- ulmBLAS☆104Updated 2 years ago