ChenhanYu / hmlp
High-Performance Machine Learning Primitives
☆10Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for hmlp
- ☆15Updated 3 years ago
- HiCMA: Hierarchical Computations on Manycore Architectures☆28Updated last year
- Experimental Linear Algebra Performance Studies☆12Updated 7 years ago
- Tensor Contraction Code Generator☆36Updated 7 years ago
- The SparseX sparse kernel optimization library☆39Updated 5 years ago
- sparse matrix pre-processing library☆81Updated 6 months ago
- Parallel Tensor Infrastructure (ParTI!)☆28Updated 4 years ago
- Julia ports of the Rodinia benchmark suite for heterogeneous computing infrastructures☆48Updated last year
- Communication Avoiding Numerical Dense Matrix Computations☆11Updated 3 years ago
- CUDA and OpenMP implementations of C2R/R2C inplace transposition☆45Updated 9 years ago
- Error-Free Transformations as building blocks for compensated algorithms☆14Updated last year
- A proxy app for the Monte Carlo Transport Code, Mercury. LLNL-CODE-684037☆39Updated 9 months ago
- cuASR: CUDA Algebra for Semirings☆34Updated 2 years ago
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 4 years ago
- QMCPACK miniapp: a simplified real space QMC code for algorithm development, performance portability testing, and computer science experi…☆27Updated 3 months ago
- Prototype of OpenSHMEM for NVIDIA GPUs, developed as part of DoE Design Forward☆20Updated 6 years ago
- Loop Kernel Analysis and Performance Modeling Toolkit☆89Updated 2 months ago
- MiniFE Finite Element Mini-Application☆29Updated 6 months ago
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago
- A dynamic analysis tool to detect floating-point errors in HPC applications.☆33Updated 2 years ago
- Recursive LAPACK Collection☆42Updated 2 years ago
- TTC: A high-performance Compiler for Tensor Transpositions☆20Updated 7 years ago
- ☆13Updated 2 years ago
- Fork of magma to include more BLAS☆28Updated 7 years ago
- A GPU performance prediction toolkit for CUDA programs☆16Updated 5 years ago
- Distributed-memory, arbitrary-precision, dense and sparse-direct linear algebra, conic optimization, and lattice reduction☆65Updated last month
- Autonomic Performance Environment for eXascale (APEX)☆38Updated 3 weeks ago
- A C++ library for computing large scale tensor contractions.☆36Updated 6 years ago
- This tool serves as a test harness for different optimization techniques to improve stencil computations performance in shared and distri…☆20Updated 2 years ago