HicrestLaboratory / SPARTA
SParse AcceleRation on Tensor Architecture
☆17Updated last month
Related projects ⓘ
Alternatives and complementary repositories for SPARTA
- cuASR: CUDA Algebra for Semirings☆34Updated 2 years ago
- A GPU performance prediction toolkit for CUDA programs☆16Updated 5 years ago
- ☆14Updated 2 months ago
- ☆11Updated 3 years ago
- Sparsity support for PyTorch☆31Updated this week
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago
- CUDA templates for tile-sparse matrix multiplication based on CUTLASS.☆49Updated 6 years ago
- Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018☆71Updated 4 years ago
- NPBench - A Benchmarking Suite for High-Performance NumPy☆73Updated this week
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆30Updated 4 months ago
- Research and development for optimizing transformers☆125Updated 3 years ago
- ☆15Updated 5 years ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆29Updated 2 months ago
- Julia ports of the Rodinia benchmark suite for heterogeneous computing infrastructures☆48Updated last year
- A Python library transfers PyTorch tensors between CPU and NVMe☆98Updated this week
- CUDA 12.2 HMM demos☆17Updated 3 months ago
- Custom-Precision Floating-point numbers.☆29Updated 5 months ago
- ☆23Updated 10 months ago
- MagmaDNN: a simple deep learning framework in c++☆45Updated 4 years ago
- A Data-Centric Compiler for Machine Learning☆82Updated 10 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆60Updated 8 months ago
- A library of GPU kernels for sparse matrix operations.☆249Updated 4 years ago
- Round matrix elements to lower precision in MATLAB☆35Updated 2 years ago
- GPU Performance Advisor☆63Updated 2 years ago
- 🎃 GPU load-balancing library for regular and irregular computations.☆58Updated 5 months ago
- ☆29Updated 2 years ago
- Cavs: An Efficient Runtime System for Dynamic Neural Networks☆13Updated 4 years ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆46Updated 2 months ago
- NVIDIA's launch, startup, and logging scripts used by our MLPerf Training and HPC submissions☆22Updated 3 weeks ago
- A searchable Python interface to the SuiteSparse Matrix Collection☆42Updated 2 years ago