ranggihwang / Pregated_MoELinks

☆55

Alternatives and similar repositories for Pregated_MoE

Users that are interested in Pregated_MoE are comparing it to the libraries listed below

Sorting:

pku-liang / ArkVale
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)
☆43Updated 10 months ago
abhibambhaniya / GenZ-LLM-Analyzer
LLM Inference analyzer for different hardware platforms
☆94Updated 3 months ago
xxyux / SpInfer
SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
☆59Updated 7 months ago
YJHMITWEB / ExFlow
Explore Inter-layer Expert Affinity in MoE Model Inference
☆14Updated last year
snu-comparch / Tender
Tender: Accelerating Large Language Models via Tensor Decompostion and Runtime Requantization (ISCA'24)
☆21Updated last year
naver-aics / lut-gemm
☆76Updated last year
DD-DuDa / BitLadder
A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.
☆60Updated this week
microsoft / SparTA
☆153Updated last year
goliaro / specinfer-ae
☆24Updated last year
PrincetonUniversity / LLMCompass
☆196Updated this week
snu-comparch / InfiniGen
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
☆155Updated last year
d-matrix-ai / keyformer-llm
☆59Updated last year
VITA-Group / Q-Hitter
☆15Updated last year
casys-kaist / LLMServingSim
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
☆144Updated 3 months ago
AIS-SNU / Smart-Infinity
[HPCA'24] Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System
☆49Updated 3 months ago
upmem / upmem_llm_framework
UPMEM LLM Framework allows profiling PyTorch layers and functions and simulate those layers/functions with a given hardware profile.
☆36Updated 2 months ago
VIA-Research / vTrain
☆73Updated 5 months ago
pku-liang / MAGIS
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
☆55Updated last year
leesou / PIM-DL-ASPLOS
PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization
☆33Updated last year
LoongServe / LoongServe
☆124Updated 11 months ago
osayamenja / FlashMoE
Distributed MoE in a Single Kernel [NeurIPS '25]
☆89Updated 3 weeks ago
ParCIS / FlashSparse
FlashSparse significantly reduces the computation redundancy for unstructured sparsity (for SpMM and SDDMM) on Tensor Cores through a Swa…
☆31Updated 3 weeks ago
hao-ai-lab / MuxServe
☆74Updated last week
xinhao-luo / ClusterFusion
[NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
☆45Updated last month
UDC-GAC / venom
A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
☆53Updated last year
thu-ml / Jetfire-INT8Training
☆58Updated last year
sitar-lab / NeuSight
☆53Updated 4 months ago
mutinifni / splitwise-sim
LLM serving cluster simulator
☆116Updated last year
SNU-ARC / any-precision-llm
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
☆118Updated 3 months ago
monellz / FlashTensor
☆16Updated 7 months ago