JeffersonLab / qphix
QCD for Intel Xeon Phi and Xeon processors
☆13Updated 6 months ago
Related projects: ⓘ
- A proxy app for the Monte Carlo Transport Code, Mercury. LLNL-CODE-684037☆39Updated 7 months ago
- TAU Performance System Public Mirror (Updated every night at midnight, USA Pacific Time)☆38Updated this week
- Comb is a communication performance benchmarking tool.☆23Updated last year
- The Task-Aware MPI (TAMPI) library extends the functionality of standard MPI libraries by providing new mechanisms for improving the inte…☆23Updated 4 months ago
- RAJA Performance Suite☆110Updated last week
- Next generation library for iterative sparse solvers for ROCm platform☆74Updated this week
- Prototype of OpenSHMEM for NVIDIA GPUs, developed as part of DoE Design Forward☆21Updated 6 years ago
- Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support☆47Updated last month
- This tool serves as a test harness for different optimization techniques to improve stencil computations performance in shared and distri…☆20Updated last year
- Training examples for SYCL☆38Updated 6 months ago
- ☆15Updated 8 months ago
- XSBench: The Monte Carlo Macroscopic Cross Section Lookup Benchmark☆69Updated 6 months ago
- Repository to discuss internal hybrid working group issues☆16Updated last year
- ☆25Updated 4 years ago
- Highly Efficient FFT for Exascale☆35Updated 4 months ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆21Updated last week
- ☆30Updated 3 years ago
- High Performance Linpack for Next-Generation AMD HPC Accelerators☆41Updated last week
- HPCG benchmark based on ROCm platform☆35Updated 2 months ago
- Compute applications.☆25Updated 4 years ago
- Pragmatic, Productive, and Portable Affinity for HPC☆31Updated 2 weeks ago
- Loop Kernel Analysis and Performance Modeling Toolkit☆86Updated 2 weeks ago
- The SparseX sparse kernel optimization library☆39Updated 5 years ago
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago
- fast Fourier transform on GPU in shared memory for AstroAccelerate project☆24Updated 3 years ago
- A GPU performance prediction toolkit for CUDA programs☆16Updated 5 years ago
- ☆39Updated 4 years ago
- MPI accelerator-integrated communication extensions☆33Updated last year
- Logger for MPI communication☆26Updated last year
- Next generation LAPACK implementation for ROCm platform☆91Updated this week