berenger-eu / avx-512-sort
Fast AVX512 (AVX-512) quicksort + bitonic sort.
☆27Updated 2 years ago
Alternatives and similar repositories for avx-512-sort:
Users that are interested in avx-512-sort are comparing it to the libraries listed below
- InstLatX64_Demo☆41Updated last month
- immintrin_dbg.h is an include file, a wrapper around immintrin.h. It implements most of AVX, AVX2, AVX-512 vector intrinsics to enable so…☆57Updated 2 years ago
- AVX512F and AVX2 versions of quick sort☆105Updated 7 years ago
- User-space Page Management☆106Updated 6 months ago
- A small library and kernel module for easy access to x86 performance monitor counters under Linux.☆98Updated 9 months ago
- Intel® Instrumentation and Tracing Technology (ITT) and Just-In-Time (JIT) API☆96Updated this week
- This is a mirror of the official libpfm4 git repository, https://sourceforge.net/p/perfmon2/libpfm4/ci/master/tree/ with some local branc…☆56Updated 3 months ago
- Ocolos is the first online code layout optimization system for unmodified applications written in unmanaged languages.☆52Updated last year
- ☆56Updated last week
- CERE: Codelet Extractor and REplayer☆40Updated last year
- Testing memory-level parallelism☆67Updated 11 months ago
- Parallel Memory Bandwidth Measurement / Benchmark Tool☆106Updated 2 years ago
- A Benchmark Toolkit for Assembly Instructions Using the LLVM JIT☆16Updated 4 years ago
- Quicksilver superpage management system☆11Updated 3 years ago
- ☆35Updated 7 months ago
- ☆20Updated last year
- Very low-overhead timer/counter interfaces for C on Intel 64 processors.☆121Updated 5 years ago
- Predator: Predictive False Sharing Detection☆21Updated 10 years ago
- Benchmarks for auto-vectorization and revectorization, including both hand-vectorized and scalar code☆27Updated 6 years ago
- ☆26Updated 3 years ago
- Haystack is an analytical cache model that given a program computes the number of cache misses.☆45Updated 5 years ago
- ROB size testing utility☆142Updated 3 years ago
- Library with JIT (Just-in-time) compilation support to optimize performance of small and medium matrix multiplication☆14Updated 3 years ago
- Generic Automatic Parallel Profiler☆34Updated 4 years ago
- Emulating DMA Engines on GPUs for Performance and Portability☆37Updated 9 years ago
- Linux Cross-Memory Attach☆90Updated 5 months ago
- Intel® Query Processing Library (Intel® QPL)☆100Updated this week
- ssmem is a simple object-based memory allocator with epoch-based garbage collection☆34Updated 8 years ago
- Montage is a system for building fast buffered persistent data structures on nonvolatile memory.☆15Updated 2 years ago
- Code used for generating charts and measurements of nontemporal stores☆9Updated 6 years ago