A highly efficient library for GEMM operations on Sunway TaihuLight
☆18Sep 7, 2020Updated 5 years ago
Alternatives and similar repositories for swGEMM
Users that are interested in swGEMM are comparing it to the libraries listed below
Sorting:
- A Deep Learning Framework customized for Sunway TaihuLight☆41Jan 8, 2019Updated 7 years ago
- A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves (SpTRSV)☆22Feb 14, 2020Updated 6 years ago
- some demos for cpc☆12Jun 9, 2018Updated 7 years ago
- CUDA GPU implementation of GMRES iterative Solver☆10Apr 16, 2012Updated 13 years ago
- Some "Formula Translations" for Yousef Saad's book "Iterative Methods for Sparse Linear Systems (2nd Edition)"☆13Jan 14, 2018Updated 8 years ago
- ☆46Jun 19, 2024Updated last year
- This repository provides code for SVD and Importance sampling-based algorithms for large scale topic modeling.☆15Dec 14, 2020Updated 5 years ago
- Rewrite OpenGFW in Rust, with web-ui.☆17Mar 3, 2025Updated 11 months ago
- ☆12Jan 19, 2020Updated 6 years ago
- PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.☆10Feb 10, 2022Updated 4 years ago
- ☆12Jan 13, 2023Updated 3 years ago
- GPT Demo with hybrid distributed training☆10Dec 1, 2022Updated 3 years ago
- SQL Optimizations using MLIR☆12Apr 5, 2020Updated 5 years ago
- 一起来数三角形吧!☆10Jun 27, 2024Updated last year
- Source code of the PPoPP '22 paper: "TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs" by Y…☆46May 22, 2024Updated last year
- A sparse BLAS lib supporting multiple backends☆50Nov 23, 2025Updated 3 months ago
- ☆49Sep 5, 2020Updated 5 years ago
- benchmark for linux server☆13Nov 6, 2016Updated 9 years ago
- The IBM Hyper Protect iOS SDK for CareKit is an addon for the CareKit framework that consumes IBM Hyper Protect Services for zero-trust p…☆13Sep 2, 2020Updated 5 years ago
- CPC2018第二届国产CPU并行应用挑战赛决赛☆11Oct 26, 2018Updated 7 years ago
- Depict GPU memory footprint during DNN training of PyTorch☆11Nov 17, 2022Updated 3 years ago
- ☆10Aug 4, 2020Updated 5 years ago
- Drawing Comparison Figures in Scientific Research Papers, includes lines and bars.☆11Mar 22, 2024Updated last year
- Arrow Matrix Decomposition - Communication-Efficient Distributed Sparse Matrix Multiplication☆15Mar 25, 2024Updated last year
- Fast Synchronization-Free Algorithms for Parallel Sparse Triangular Solves with Multiple Right-Hand Sides (SpTRSM)☆14Feb 14, 2020Updated 6 years ago
- Optimized half precision gemm assembly kernels (deprecated due to ROCm)☆47Jun 16, 2017Updated 8 years ago
- ☆14Mar 18, 2022Updated 3 years ago
- Memory latency test☆13May 14, 2024Updated last year
- ☆14Apr 5, 2023Updated 2 years ago
- Linux kernel SGX driver for Graphene☆12Nov 3, 2020Updated 5 years ago
- Drop-in library for tracking the memory allocations of CUDA applications☆14Nov 17, 2017Updated 8 years ago
- High-Performance Machine Learning Primitives☆13Apr 17, 2021Updated 4 years ago
- Jpak compression format☆15Mar 12, 2017Updated 8 years ago
- Fast GPU based tensor core reductions☆13Jan 13, 2023Updated 3 years ago
- A Generic Resource-Aware Hyperparameter Tuning Execution Engine☆15Jan 8, 2022Updated 4 years ago
- A suite of stochastic optimization methods for solving the empirical risk minimization problem.☆17Nov 20, 2019Updated 6 years ago
- ☆17Jun 24, 2024Updated last year
- Keystone security monitor library for opensbi (Discountinued after monorepo-izing)☆13Oct 28, 2022Updated 3 years ago
- Automatically exported from code.google.com/p/sse2neon☆11Mar 16, 2020Updated 5 years ago