ScalableMachinesResearch / JXPerf
Java inefficiency detection tool based on CPU performance monitoring counters and hardware debug register. The tool detects dead writes, silent stores, and redundant loads.
☆45Updated 3 years ago
Alternatives and similar repositories for JXPerf:
Users that are interested in JXPerf are comparing it to the libraries listed below
- ☆11Updated 3 years ago
- ☆34Updated 3 years ago
- Light-weight Performance Variance Detection for Production-run Parallel Applications☆12Updated last year
- GVProf: A Value Profiler for GPU-based Clusters☆49Updated 11 months ago
- Artifact Evaluation Reproduction for "Software Prefetching for Indirect Memory Accesses", CGO 2017, using CK.☆38Updated 3 years ago
- CUDAAdvisor: a GPU profiling tool☆48Updated 6 years ago
- GPU Performance Advisor☆64Updated 2 years ago
- Thinking is hard - automate it☆19Updated 2 years ago
- Horizontal Fusion☆22Updated 3 years ago
- ngAP's artifact for ASPLOS'24☆20Updated last month
- Out-of-GPU-Memory Graph Processing with Minimal Data Transfer☆53Updated 2 years ago
- A pattern-based algorithmic autotuner for graph processing on GPUs.☆30Updated 2 months ago
- SpV8 is a SpMV kernel written in AVX-512. Artifact for our SpV8 paper @ DAC '21.☆27Updated 3 years ago
- Evaluating different memory managers for dynamic GPU memory☆25Updated 4 years ago
- ☆17Updated 2 years ago
- HeteroSync is a benchmark suite for performing fine-grained synchronization on tightly coupled GPUs☆28Updated 5 months ago
- Chai☆42Updated last year
- Instanciate the Cache Aware Roofline Model on single socket and multisocket systems.☆27Updated 6 years ago
- Source code of the simulator used in the Mosaic paper from MICRO 2017: "Mosaic: A GPU Memory Manager with Application-Transparent Support…☆43Updated 6 years ago
- Efficient-Tensor-Management-on-HM-for-Deep-Learning☆9Updated 3 years ago
- ☆68Updated 4 years ago
- A low-overhead tool to periodically collect system-wide hardware performance counters on Intel64 systems.☆31Updated 2 years ago
- A Shared Memory Multithreaded Graph Benchmark Suite for Multicores☆34Updated 2 years ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆32Updated 4 years ago
- A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves (SpTRSV)☆21Updated 5 years ago
- TLB Benchmarks☆33Updated 7 years ago
- Parallelized and vectorized SpMV on Intel Xeon Phi (Knights Landing, AVX512, KNL)☆24Updated last year
- ☆27Updated 2 years ago
- A GPU-accelerated DNN inference serving system that supports instant kernel preemption and biased concurrent execution in GPU scheduling.☆41Updated 2 years ago
- Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite☆64Updated 6 years ago