microsoft / BLAS-on-flash
Linear algebra subroutines for large SSD-resident dense and sparse matrices
☆27Updated 4 years ago
Alternatives and similar repositories for BLAS-on-flash:
Users that are interested in BLAS-on-flash are comparing it to the libraries listed below
- Artifact for PPoPP 2018 paper "Making Pull-Based Graph Processing Performant"☆23Updated 4 years ago
- HeteroSync is a benchmark suite for performing fine-grained synchronization on tightly coupled GPUs☆28Updated 5 months ago
- GPUDirect Async support for IB Verbs☆100Updated 2 years ago
- A low-overhead tool to periodically collect system-wide hardware performance counters on Intel64 systems.☆31Updated 2 years ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆30Updated 2 months ago
- Instanciate the Cache Aware Roofline Model on single socket and multisocket systems.☆27Updated 5 years ago
- Persistent Collectives X- A collective communication library for high performance, low cost persistent collectives over RDMA devices.☆14Updated 6 years ago
- User-space Page Management☆106Updated 6 months ago
- LonestarGPU: Irregular algorithms parallelized for GPUs☆33Updated 5 years ago
- Pointer-chasing memory benchmark (forked from Doug Pase's code).☆59Updated 11 years ago
- Collective library☆8Updated 4 years ago
- ☆17Updated 2 years ago
- A Micro-benchmarking Tool for HPC Networks☆25Updated last month
- A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory Programs☆74Updated last year
- Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019☆56Updated 2 years ago
- A hierarchical collective communications library with portable optimizations☆29Updated 2 months ago
- Pannotia v0.9 is a suite of OpenCL graph applications☆23Updated 7 years ago
- Artifact Evaluation Reproduction for "Software Prefetching for Indirect Memory Accesses", CGO 2017, using CK.☆38Updated 3 years ago
- A platform to evaluate techniques used in multicore graph processing.☆37Updated 6 years ago
- tools to create performance and roofline plots from measured data☆58Updated 10 years ago
- A GPU-Accelerated In-Memory Key-Value Store (AWS-focused fork)☆28Updated 7 years ago
- OFI Programmer's Guide☆52Updated 2 years ago
- pytorch ucc plugin☆18Updated 3 years ago
- A library for constructing allocators and memory pools. It also contains broadly useful abstractions and utilities for memory management.…☆52Updated this week
- A NUMA-aware Graph-structured Analytics Framework☆42Updated 6 years ago
- TLB Benchmarks☆33Updated 7 years ago
- NUMA-Aware Reader-Writer Locks☆18Updated 10 years ago
- NUMAPROF is a NUMA memory profliler based on Pintool to track your remote memory accesses.☆46Updated 7 months ago
- ☆52Updated 5 years ago
- A Shared Memory Multithreaded Graph Benchmark Suite for Multicores☆34Updated 2 years ago