Short examples illustrating AVX2 intrinsics for simple tasks.
☆101Mar 13, 2024Updated 2 years ago
Alternatives and similar repositories for avx2-examples
Users that are interested in avx2-examples are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Example code for Intel AVX / AVX2 intrinsics.☆146Sep 18, 2023Updated 2 years ago
- Dictionary compressor with nibbled ANS and optimal parsing. Other compression experiments.☆26Apr 13, 2025Updated last year
- An Architecture-level Fault Injection Tool for GPU Application Resilience Evaluations☆21Apr 14, 2020Updated 6 years ago
- Compiler plugin for performance analysis of HIP applications☆14Apr 7, 2025Updated last year
- Some of the fastest decoding range-based Asymetric Numeral Systems (rANS) codecs for x64☆20Sep 3, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A low-overhead tool to periodically collect system-wide hardware performance counters on Intel64 systems.☆32Aug 2, 2022Updated 3 years ago
- An implementation of SGEMV with performance comparable to cuBLAS.☆12May 21, 2021Updated 5 years ago
- The ultimate bandwidth benchmark☆66Apr 5, 2026Updated 2 months ago
- cuDTW++: Ultra-Fast Dynamic Time Warping on CUDA-enabled GPUs☆34May 11, 2020Updated 6 years ago
- A portable implementation of SZ lossy compression for AMD GPUs and Hygon DCUs.☆10Feb 26, 2025Updated last year
- The vOW4SIKE project provides C code that implements the parallel collision search algorithm by van Oorschot and Wiener (vOW). The algori…☆12May 25, 2021Updated 5 years ago
- Source code of the IPDPS '21 paper: "TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs" by Yuyao Niu, Zhengyang…☆13Aug 12, 2022Updated 3 years ago
- Testing AVX capabilities with GCC☆11Jan 24, 2016Updated 10 years ago
- Nonequispaced FFTs on GPUs (based on NFFT: http://www.nfft.org)☆11Apr 30, 2018Updated 8 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Recoil: Parallel rANS Decoding with Decoder-Adaptive Scalability☆18Jun 26, 2023Updated 2 years ago
- Raytracing test that shows how primary rays are dispatched on the GPU.☆13Sep 1, 2019Updated 6 years ago
- Measure instruction latency and throughput☆31Sep 2, 2025Updated 9 months ago
- Elastic and fault tolerant parallel map and parallel map reduce methods. Part of the COFII framework.☆16Jun 3, 2026Updated 2 weeks ago
- Slurm Examples☆10Aug 30, 2024Updated last year
- A Benchmark Toolkit for Assembly Instructions Using the LLVM JIT☆18Oct 26, 2020Updated 5 years ago
- TVM learning and research☆13Jan 8, 2021Updated 5 years ago
- ☆11Mar 15, 2023Updated 3 years ago
- The code for an FPGA softcore comparison☆11Jun 21, 2020Updated 5 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆10May 4, 2023Updated 3 years ago
- Performance Benchmark Recipes for Power-based systems☆14Apr 12, 2019Updated 7 years ago
- The Mixing method: coordinate descent for low-rank semidefinite programming☆15Apr 30, 2021Updated 5 years ago
- Mining CryptoNight Haven on the Varium C1100☆10Apr 1, 2022Updated 4 years ago
- A simple cycle accurate template model for ASIC/FPGA hardware design. Including a cycle accurate FIFO design example. More designs are co…☆17Sep 5, 2019Updated 6 years ago
- HCC Sample Applications☆13Jan 3, 2017Updated 9 years ago
- Implement asm gemm on vega64 for 4096x4096 fp32 matrix☆22Oct 12, 2019Updated 6 years ago
- Presentation materials for the 2016 Berkeley C++ Summit☆14Oct 20, 2016Updated 9 years ago
- Fast streams for block gzip files.☆14Nov 11, 2025Updated 7 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆11Apr 16, 2026Updated 2 months ago
- ☆11Jan 9, 2021Updated 5 years ago
- ☆13Sep 19, 2024Updated last year
- Einsum Expressions in Julia☆14Aug 2, 2025Updated 10 months ago
- The fastest Tropical number matrix multiplication on GPU☆10Aug 23, 2025Updated 9 months ago
- Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs☆14Apr 3, 2025Updated last year
- Implementation of Brakerski's leveled homomorphic encryption system☆44Feb 12, 2017Updated 9 years ago