lutnn / blink-mmLinks

☆15

Alternatives and similar repositories for blink-mm

Users that are interested in blink-mm are comparing it to the libraries listed below

Sorting:

microsoft / SparTA
☆150Updated last year
ranggihwang / Pregated_MoE
☆51Updated last year
leesou / PIM-DL-ASPLOS
PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization
☆33Updated last year
abhibambhaniya / GenZ-LLM-Analyzer
LLM Inference analyzer for different hardware platforms
☆87Updated last month
naver-aics / lut-gemm
☆68Updated last year
AlibabaResearch / flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆218Updated last year
xxyux / SpInfer
SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
☆57Updated 5 months ago
ParCIS / Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
☆89Updated 2 years ago
SNU-ARC / flashneuron
☆39Updated 2 years ago
tonyzhang617 / nomad-dist
☆38Updated last year
EfficientMoE / MoE-Infinity
PyTorch library for cost-effective, fast and easy serving of MoE models.
☆224Updated last month
ParCIS / FlashSparse
FlashSparse significantly reduces the computation redundancy for unstructured sparsity (for SpMM and SDDMM) on Tensor Cores through a Swa…
☆29Updated last month
YJHMITWEB / ExFlow
Explore Inter-layer Expert Affinity in MoE Model Inference
☆13Updated last year
pku-liang / ArkVale
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)
☆43Updated 8 months ago
Azure / msccl
Microsoft Collective Communication Library
☆67Updated 9 months ago
PrincetonUniversity / LLMCompass
☆181Updated last year
d-matrix-ai / keyformer-llm
☆54Updated last year
IntelLabs / FP8-Emulation-Toolkit
PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.
☆110Updated 8 months ago
parasailteam / coconet
☆81Updated 2 years ago
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆160Updated last week
casys-kaist / EnvPipe
☆25Updated 2 years ago
sitar-lab / NeuSight
☆50Updated 2 months ago
UDC-GAC / venom
A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
☆53Updated last year
DerrickYLJ / TidalDecode
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆47Updated 3 weeks ago
casys-kaist / LLMServingSim
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
☆135Updated last month
snu-comparch / InfiniGen
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
☆152Updated last year
Raphael-Hao / brainstorm
Compiler for Dynamic Neural Networks
☆46Updated last year
PKU-SEC-Lab / HybriMoE
[DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"
☆68Updated 2 months ago
LeiWang1999 / Stream-k.tvm
☆19Updated 11 months ago
ZhangJingrong / gpu_topK_benchmark
GPU TopK Benchmark
☆15Updated 8 months ago