microsoft / SparTALinks

☆150

Alternatives and similar repositories for SparTA

Users that are interested in SparTA are comparing it to the libraries listed below

Sorting:

AlibabaResearch / flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆216Updated last year
UDC-GAC / venom
A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
☆52Updated last year
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆154Updated last month
uwsampl / SparseTIR
SparseTIR: Sparse Tensor Compiler for Deep Learning
☆137Updated 2 years ago
naver-aics / lut-gemm
☆64Updated last year
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆114Updated this week
DD-DuDa / BitDecoding
A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.
☆56Updated last week
ParCIS / Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
☆89Updated 2 years ago
xxyux / SpInfer
SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
☆50Updated 4 months ago
ParCIS / Chimera
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
☆67Updated 4 months ago
LeiWang1999 / tvm_gpu_gemm
play gemm with tvm
☆91Updated 2 years ago
Guangxuan-Xiao / torch-int
This repository contains integer operators on GPUs for PyTorch.
☆208Updated last year
usyd-fsalab / fp6_llm
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
☆260Updated 2 weeks ago
abhibambhaniya / GenZ-LLM-Analyzer
LLM Inference analyzer for different hardware platforms
☆80Updated 3 weeks ago
AlibabaPAI / FLASHNN
☆96Updated 10 months ago
zhuohan123 / terapipe
☆75Updated 4 years ago
ColfaxResearch / cutlass-kernels
☆227Updated last year
parasailteam / coconet
☆80Updated 2 years ago
wangsiping97 / FastGEMV
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
☆113Updated last year
osayamenja / Kleos
Complete GPU residency for ML.
☆37Updated this week
nox-410 / tvm.tl
An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.
☆50Updated last year
ranggihwang / Pregated_MoE
☆49Updated last year
LoongServe / LoongServe
☆109Updated 8 months ago
hgyhungry / ShflBW_Sparse_NN
☆16Updated 2 years ago
luliyucoordinate / cute-flash-attention
Implement Flash Attention using Cute.
☆92Updated 7 months ago
pku-liang / ArkVale
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)
☆42Updated 7 months ago
tgale96 / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆106Updated 2 months ago
hao-ai-lab / MuxServe
☆65Updated last year
TiledTensor / TiledCUDA
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆183Updated 6 months ago
humuyan / Korch
ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch
☆38Updated 4 months ago