szcompressor / cuSZp
Fast GPU error-bounded lossy compressor for floating-point data.
☆34Updated 3 months ago
Alternatives and similar repositories for cuSZp:
Users that are interested in cuSZp are comparing it to the libraries listed below
- A GPU accelerated error-bounded lossy compression for scientific data.☆73Updated 2 weeks ago
- FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Data on GPUs☆11Updated last year
- A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores☆50Updated last year
- ☆36Updated 2 years ago
- [CF ’20] Verified Instruction-Level Energy Consumption Measurement for NVIDIA GPUs☆15Updated 4 years ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆21Updated last month
- [IJCAI2023] An automated parallel training system that combines the advantages from both data and model parallelism. If you have any inte…☆51Updated last year
- Thunder Research Group's Collective Communication Library☆34Updated 11 months ago
- GEMM and Winograd based convolutions using CUTLASS☆26Updated 4 years ago
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated last year
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated 5 months ago
- Source code for the CPU-Free model - a fully autonomous execution model for multi-GPU applications that completely excludes the involveme…☆17Updated 11 months ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆127Updated 4 years ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆32Updated 4 years ago
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆85Updated 2 years ago
- Cavs: An Efficient Runtime System for Dynamic Neural Networks☆14Updated 4 years ago
- DeepSZ: A Novel Framework to Compress Deep Neural Networks by Using Error-Bounded Lossy Compression☆11Updated 4 years ago
- Sparse-dense matrix-matrix multiplication on GPUs☆14Updated 6 years ago
- pytorch-profiler☆51Updated last year
- Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Mult…☆41Updated last year
- Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs☆15Updated 6 years ago
- Artifacts of EVT ASPLOS'24☆23Updated last year
- ☆42Updated 11 months ago
- Benchmark for matrix multiplications between dense and block sparse (BSR) matrix in TVM, blocksparse (Gray et al.) and cuSparse.☆24Updated 4 years ago
- ☆39Updated 5 years ago
- SparseTIR: Sparse Tensor Compiler for Deep Learning☆135Updated 2 years ago
- Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.☆61Updated last week
- ☆22Updated 2 years ago
- ☆17Updated 4 years ago
- An extension library of WMMA API (Tensor Core API)☆93Updated 8 months ago