satishphd / Teaching-Intel-Intrinsics-for-SIMD-ParallelismLinks
Teaching Vectorization and SIMD using Intel Intrinsics in a Computer Organization and Architecture class
☆15Updated 5 months ago
Alternatives and similar repositories for Teaching-Intel-Intrinsics-for-SIMD-Parallelism
Users that are interested in Teaching-Intel-Intrinsics-for-SIMD-Parallelism are comparing it to the libraries listed below
Sorting:
- SYCL Reference Manual☆28Updated last year
- ☆58Updated last month
- A header only library implementing common mathematical functions using SIMD intrinsics☆109Updated last week
- SYCL Conformance Tests☆70Updated last week
- Little OpenMP Library☆163Updated 2 years ago
- Simple OpenCL Samples that Build with Khronos Headers and Libs☆108Updated last week
- SYCL Benchmark Suite☆65Updated 3 weeks ago
- SYCL Open Source Specification☆136Updated this week
- Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.☆123Updated last year
- ☆144Updated 2 weeks ago
- Agenium Scale vectorization library for CPUs and GPUs☆333Updated 3 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆55Updated 3 months ago
- ☆151Updated this week
- Intel® Instrumentation and Tracing Technology (ITT) and Just-In-Time (JIT) APIs☆117Updated 3 weeks ago
- Distributed ranges is a generalization of C++ ranges for distributed data structures.☆51Updated 2 weeks ago
- x86-64, ARM, and RVV intrinsics viewer☆53Updated 3 months ago
- ☆141Updated 2 weeks ago
- A lightweight memory allocator for hardware-accelerated machine learning☆151Updated 3 months ago
- This is a mirror of the official libpfm4 git repository, https://sourceforge.net/p/perfmon2/libpfm4/ci/master/tree/ with some local branc…☆65Updated 8 months ago
- performance experiments for C++ exception handling☆30Updated 3 years ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆133Updated last year
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆119Updated this week
- pika is a C++ tasking library built on std::execution with fibers, CUDA, HIP, and MPI support.☆74Updated last week
- The Farm-SVE package provides a header that implements the ARM C language extensions (ACLE) for the ARM Scalable Vector Extension (SVE) i…☆14Updated last year
- A collection of performance analysis tools, recipes, handy scripts, microbenchmarks & more☆139Updated 3 weeks ago
- A fast implementation of log() and exp()☆53Updated 2 years ago
- Test the non-AVX, AVX2 and AVX-512 speeds across various active core counts☆216Updated 8 months ago
- The Berkeley Container Library☆124Updated last year
- ☆30Updated 2 years ago
- An implementation of HIP that works on CPUs, across OSes.☆121Updated last year