dian-lun-lin / SNIG
SNIG: Accelerated Large Sparse Neural Network Inference using Task Graph Parallelism
☆34Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for SNIG
- An extension library of WMMA API (Tensor Core API)☆84Updated 4 months ago
- ☆38Updated 4 years ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆116Updated 4 years ago
- TLB Benchmarks☆32Updated 7 years ago
- ☆44Updated 5 years ago
- 🎃 GPU load-balancing library for regular and irregular computations.☆57Updated 5 months ago
- CUDA PTX-ISA Document 中文翻译版☆26Updated 8 months ago
- Artifact of ASPLOS'23 paper entitled: GRACE: A Scalable Graph-Based Approach to Accelerating Recommendation Model Inference☆16Updated last year
- TileFlow is a performance analysis tool based on Timeloop for fusion dataflows☆55Updated 7 months ago
- Heterogeneous Programming☆17Updated last year
- ☆29Updated 2 years ago
- ☆32Updated 2 years ago
- A language and compiler for irregular tensor programs.☆133Updated 6 months ago
- development repository for the open earth compiler☆77Updated 3 years ago
- ☆41Updated 4 years ago
- Concurrent CPU-GPU Programming using Task Models☆100Updated 4 years ago
- GPU TopK Benchmark☆14Updated last year
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆81Updated 2 years ago
- ☆80Updated 7 months ago
- Some source code about matrix multiplication implementation on CUDA☆35Updated 6 years ago
- GPU Performance Advisor☆63Updated 2 years ago
- study of Ampere' Sparse Matmul☆14Updated 3 years ago
- A Method for efficiently processing SpMV using SIMD and load balancing☆16Updated 2 years ago
- MLIR Sample dialect☆103Updated last month
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆31Updated 4 years ago
- A pattern-based algorithmic autotuner for graph processing on GPUs.☆30Updated last year
- ☆40Updated 3 years ago
- End to End steps for adding custom ops in PyTorch.☆19Updated 4 years ago
- Dissecting NVIDIA GPU Architecture☆82Updated 2 years ago
- Multiple 1-stencil implementations using nvidia cuda.☆13Updated 6 years ago