ChenhanYu / hmlp
High-Performance Machine Learning Primitives
☆11Updated 3 years ago
Alternatives and similar repositories for hmlp:
Users that are interested in hmlp are comparing it to the libraries listed below
- HiCMA: Hierarchical Computations on Manycore Architectures☆30Updated last year
- sparse matrix pre-processing library☆81Updated 8 months ago
- Experimental Linear Algebra Performance Studies☆12Updated 7 years ago
- The SparseX sparse kernel optimization library☆39Updated 6 years ago
- Tensor Contraction Code Generator☆36Updated 7 years ago
- bhSPARSE: A Sparse BLAS Library☆16Updated 9 years ago
- ☆15Updated 3 years ago
- Parallel Tensor Infrastructure (ParTI!)☆28Updated 4 years ago
- This package includes the implementation for Sparse-Matrix-Vector-Multiplication (SpMV) and Sparse-Matrix-Matrix-Multiplication (SpMM) fo…☆10Updated 4 years ago
- Software libraries that implement hierarchical matrices☆56Updated 9 months ago
- Distributed-memory, arbitrary-precision, dense and sparse-direct linear algebra, conic optimization, and lattice reduction☆65Updated 3 months ago
- The Surprisingly ParalleL spArse Tensor Toolkit.☆69Updated 2 years ago
- Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support☆48Updated last week
- H2Opus: a performance-oriented library for hierarchical matrices☆13Updated 2 years ago
- A Massively Parallel FFT Library for CPU/GPU☆54Updated 4 years ago
- MagmaDNN: a simple deep learning framework in c++☆48Updated 4 years ago
- Error-Free Transformations as building blocks for compensated algorithms☆14Updated last year
- A C++ library for computing large scale tensor contractions.☆36Updated 6 years ago
- Parallel Algorithms for Octree Meshing☆12Updated 9 years ago
- Prototype of OpenSHMEM for NVIDIA GPUs, developed as part of DoE Design Forward☆20Updated 6 years ago
- H2Lib public repository☆52Updated 2 years ago
- PFASST++ is a C++ implementation of the "parallel full approximation scheme in space and time" (PFASST) algorithm☆32Updated 8 years ago
- A proxy app for the Monte Carlo Transport Code, Mercury. LLNL-CODE-684037☆38Updated 11 months ago
- Autonomic Performance Environment for eXascale (APEX)☆42Updated this week
- CUDA and OpenMP implementations of C2R/R2C inplace transposition☆46Updated 9 years ago
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago
- Zoltan Dynamic Load Balancing and Graph Algorithm Toolkit -- Distribution site☆34Updated last year
- ☆13Updated 2 years ago
- A dynamic analysis tool to detect floating-point errors in HPC applications.☆33Updated 2 years ago
- QMCPACK miniapp: a simplified real space QMC code for algorithm development, performance portability testing, and computer science experi…☆27Updated 5 months ago