microsoft / BLAS-on-flashLinks
Linear algebra subroutines for large SSD-resident dense and sparse matrices
☆27Updated 4 years ago
Alternatives and similar repositories for BLAS-on-flash
Users that are interested in BLAS-on-flash are comparing it to the libraries listed below
Sorting:
- Artifact for PPoPP 2018 paper "Making Pull-Based Graph Processing Performant"☆23Updated 5 years ago
- Pointer-chasing memory benchmark (forked from Doug Pase's code).☆59Updated 11 years ago
- HeteroSync is a benchmark suite for performing fine-grained synchronization on tightly coupled GPUs☆30Updated 9 months ago
- NumaMMA is a lightweight memory profiler for parallel applications☆29Updated 2 weeks ago
- LonestarGPU: Irregular algorithms parallelized for GPUs☆35Updated 5 years ago
- Artifact Evaluation Reproduction for "Software Prefetching for Indirect Memory Accesses", CGO 2017, using CK.☆38Updated 3 years ago
- ☆15Updated 5 years ago
- Asynchronous Multi-GPU Programming Framework☆46Updated 4 years ago
- NUMA-Aware Reader-Writer Locks☆18Updated 11 years ago
- A platform to evaluate techniques used in multicore graph processing.☆37Updated 6 years ago
- Persistent Collectives X- A collective communication library for high performance, low cost persistent collectives over RDMA devices.☆14Updated 6 years ago
- A Micro-benchmarking Tool for HPC Networks☆31Updated 5 months ago
- ☆61Updated 6 years ago
- A GPU-Accelerated In-Memory Key-Value Store (AWS-focused fork)☆28Updated 7 years ago
- A NUMA-aware Graph-structured Analytics Framework☆44Updated 6 years ago
- OpenSHMEM Reference Implementation over UCX for Specification 1.4 and up☆36Updated 2 years ago
- OpenSHMEM Application Programming Interface☆57Updated 7 months ago
- OFI Programmer's Guide☆53Updated 2 years ago
- Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019☆56Updated 2 years ago
- ☆25Updated 2 years ago
- Sandia OpenSHMEM is an implementation of the OpenSHMEM specification over multiple Networking APIs, including Portals 4, the Open Fabric …☆70Updated 2 months ago
- TLB Benchmarks☆34Updated 7 years ago
- A hierarchical collective communications library with portable optimizations☆35Updated 6 months ago
- Benchmarking In-Memory Index Structures☆26Updated 6 years ago
- the Stanford Transactional Applications for Multi-Processing; a benchmark suite for transactional memory research☆42Updated 3 years ago
- Multi-GPU dynamic scheduler using PGAS style cross-GPU communication☆27Updated last year
- A low-overhead tool to periodically collect system-wide hardware performance counters on Intel64 systems.☆32Updated 2 years ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆33Updated 2 months ago
- CUDAAdvisor: a GPU profiling tool☆49Updated 6 years ago
- A community-oriented list of useful NUMA-related libraries, tools, and other resources☆69Updated 4 years ago