lightsighter / Weft
A Sound and Complete Verification Tool for Warp-Specialized GPU Kernels
☆18Updated 9 years ago
Alternatives and similar repositories for Weft:
Users that are interested in Weft are comparing it to the libraries listed below
- This tool serves as a test harness for different optimization techniques to improve stencil computations performance in shared and distri…☆20Updated 2 years ago
- Loop Kernel Analysis and Performance Modeling Toolkit☆92Updated last week
- LonestarGPU: Irregular algorithms parallelized for GPUs☆34Updated 5 years ago
- A task benchmark☆41Updated 7 months ago
- ☆53Updated 5 years ago
- tools to create performance and roofline plots from measured data☆58Updated 10 years ago
- JUPITER Benchmark Suite☆15Updated 7 months ago
- Chai☆43Updated last year
- CUDAAdvisor: a GPU profiling tool☆48Updated 6 years ago
- A Benchmark Suite for Heterogeneous System Computation☆53Updated last month
- Compute applications.☆24Updated 5 years ago
- A unified framework across multiple programming platforms☆36Updated 9 months ago
- Library to plot integer sets and maps☆49Updated 8 years ago
- Official BOLT Repository☆28Updated 7 months ago
- development repository for the open earth compiler☆79Updated 4 years ago
- Prototype of OpenSHMEM for NVIDIA GPUs, developed as part of DoE Design Forward☆23Updated 6 years ago
- CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels☆32Updated 4 years ago
- A tool for debugging and assessing floating point precision and reproducibility.☆74Updated 2 months ago
- The SparseX sparse kernel optimization library☆40Updated 6 years ago
- Flexible GPGPU instrumentation☆86Updated 5 years ago
- Comb is a communication performance benchmarking tool.☆24Updated 2 years ago
- GPU Code optimizer for stencil computations. Refer to our IPDPS'19 paper for more details☆24Updated 5 years ago
- A dynamic analysis tool to detect floating-point errors in HPC applications.☆33Updated 2 years ago
- HeteroSync is a benchmark suite for performing fine-grained synchronization on tightly coupled GPUs☆28Updated 6 months ago
- CUDA Dynamic Memory Allocator for SOA Data Layout☆35Updated 3 years ago
- Instanciate the Cache Aware Roofline Model on single socket and multisocket systems.☆27Updated 6 years ago
- MPI accelerator-integrated communication extensions☆32Updated last year
- Automatically exported from code.google.com/p/patus☆15Updated 9 years ago
- TAU Performance System Public Mirror (Updated every night at midnight, USA Pacific Time)☆42Updated this week
- A tracing infrastructure for heterogeneous computing applications.☆31Updated this week