cwang9208 / NPB
NAS Parallel Benchmarks
☆8Updated 7 years ago
Alternatives and similar repositories for NPB:
Users that are interested in NPB are comparing it to the libraries listed below
- Stencil Probe - a stencil microbenchmark☆30Updated 12 years ago
- The NAS Parallel Benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures☆51Updated 2 months ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated 5 months ago
- ☆34Updated 3 years ago
- HeteroSync is a benchmark suite for performing fine-grained synchronization on tightly coupled GPUs☆28Updated 6 months ago
- XSBench: The Monte Carlo Macroscopic Cross Section Lookup Benchmark☆78Updated last year
- Suite of contentious microbenchmarks☆54Updated 8 years ago
- ☆17Updated 2 years ago
- benchmark for linux server☆13Updated 8 years ago
- This package includes the implementation for four sparse linear algebra kernels: Sparse-Matrix-Vector-Multiplication (SpMV), Sparse-Trian…☆26Updated 4 years ago
- A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves (SpTRSV)☆21Updated 5 years ago
- Prototype of OpenSHMEM for NVIDIA GPUs, developed as part of DoE Design Forward☆24Updated 6 years ago
- ☆40Updated 7 years ago
- Characterizing and Modeling Non-Volatile Memory Systems [MICRO'20, TopPicks'21]☆33Updated 3 years ago
- Measure instruction latency and throughput☆23Updated last month
- Multiple 1-stencil implementations using nvidia cuda.☆13Updated 7 years ago
- Clio, ASPLOS'22.☆73Updated 3 years ago
- Slides and exercises for persistent memory programming tutorial☆12Updated 2 years ago
- This serves as a repository for reproducibility of the SC21 paper "In-Depth Analyses of Unified Virtual Memory System for GPU Accelerated…☆31Updated last year
- ☆43Updated 4 years ago
- Performance Prediction Toolkit☆51Updated 3 months ago
- Near-optimal Prefetching System☆33Updated 3 years ago
- A Shared Memory Multithreaded Graph Benchmark Suite for Multicores☆35Updated 2 years ago
- C++/MPI proxies for distributed training of deep neural networks.☆13Updated 2 years ago
- Logger for MPI communication☆26Updated last year
- Prefetching and efficient data path for memory disaggregation☆67Updated 4 years ago
- Cluster Far Mem, framework to execute single job and multi job experiments using fastswap☆21Updated last year
- Multi-GPU dynamic scheduler using PGAS style cross-GPU communication☆28Updated last year
- A Micro-benchmarking Tool for HPC Networks☆26Updated 2 months ago
- A GPU FP32 computation method with Tensor Cores.☆20Updated 2 years ago