PanZaifeng / FastTree-ArtifactLinks

☆26

Alternatives and similar repositories for FastTree-Artifact

Users that are interested in FastTree-Artifact are comparing it to the libraries listed below

Sorting:

DD-DuDa / BitLadder
A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.
☆60Updated last week
microsoft / SparTA
☆154Updated last year
AlibabaResearch / flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆221Updated 2 years ago
pku-liang / ArkVale
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)
☆43Updated 10 months ago
hao-ai-lab / MuxServe
☆74Updated 2 weeks ago
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆182Updated 3 weeks ago
PanZaifeng / RecFlex
A recommendation model kernel optimizing system
☆12Updated 4 months ago
xxyux / SpInfer
SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
☆59Updated 7 months ago
LoongServe / LoongServe
☆124Updated 11 months ago
snu-comparch / InfiniGen
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
☆155Updated last year
xinhao-luo / ClusterFusion
[NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
☆47Updated last month
Relaxed-System-Lab / HexGen
[ICML 2024] Serving LLMs on heterogeneous decentralized clusters.
☆30Updated last year
PKU-SEC-Lab / HybriMoE
[DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"
☆81Updated 4 months ago
d-matrix-ai / keyformer-llm
☆58Updated last year
ranggihwang / Pregated_MoE
☆55Updated last year
HPMLL / SpInfer_EuroSys25
☆28Updated 6 months ago
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆117Updated last month
tsinghua-ideal / Twilight
[NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning
☆51Updated last week
flashinfer-ai / cutlass-viz
☆65Updated 6 months ago
YJHMITWEB / ExFlow
Explore Inter-layer Expert Affinity in MoE Model Inference
☆14Updated last year
KuangjuX / NVSHMEM-Tutorial
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆139Updated last month
ParCIS / Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
☆89Updated 2 years ago
EfficientMoE / MoE-Infinity
PyTorch library for cost-effective, fast and easy serving of MoE models.
☆252Updated 2 weeks ago
ParCIS / Chimera
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
☆67Updated 7 months ago
facebookexperimental / triton
Github mirror of trition-lang/triton repo.
☆92Updated this week
luliyucoordinate / cute-flash-attention
Implement Flash Attention using Cute.
☆96Updated 10 months ago
toyaix / triton-runner
Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.
☆75Updated last week
UDC-GAC / venom
A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
☆53Updated last year
tile-ai / tilescale
Tile-based language built for AI computation across all scales
☆74Updated this week
andy-yang-1 / DoubleSparse
16-fold memory access reduction with nearly no loss
☆104Updated 7 months ago