thu-pacman / FasterMoELinks

☆87

Alternatives and similar repositories for FasterMoE

Users that are interested in FasterMoE are comparing it to the libraries listed below

Sorting:

zhengzangw / Sequence-Scheduling
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
☆92Updated 2 years ago
thu-pacman / SmartMoE-AE
ATC23 AE
☆47Updated 2 years ago
AlibabaResearch / flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆221Updated 2 years ago
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆118Updated last month
RulinShao / LightSeq
Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training
☆216Updated last year
d-matrix-ai / keyformer-llm
☆59Updated last year
zhuohan123 / terapipe
☆75Updated 4 years ago
ParCIS / Chimera
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
☆67Updated 7 months ago
LoongServe / LoongServe
☆124Updated 11 months ago
microsoft / chunk-attention
☆78Updated 6 months ago
smart-lty / ParallelSpeculativeDecoding
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆118Updated 6 months ago
Relaxed-System-Lab / HexGen
[ICML 2024] Serving LLMs on heterogeneous decentralized clusters.
☆30Updated last year
tgale96 / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆124Updated 4 months ago
EfficientMoE / MoE-Infinity
PyTorch library for cost-effective, fast and easy serving of MoE models.
☆252Updated last week
hao-ai-lab / vllm-ltr
[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank
☆60Updated 11 months ago
hao-ai-lab / MuxServe
☆74Updated this week
FMInference / DejaVu
☆341Updated last year
SymbioticLab / Oobleck
A resilient distributed training framework
☆95Updated last year
andy-yang-1 / DoubleSparse
16-fold memory access reduction with nearly no loss
☆105Updated 6 months ago
Hsword / Hetu
A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, …
☆121Updated last year
opengear-project / GEAR
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
☆169Updated last year
DachengLi1 / AMP
(NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.
☆41Updated 2 years ago
dilab-zju / self-speculative-decoding
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
☆203Updated 8 months ago
madsys-dev / deepseekv2-profile
☆148Updated 7 months ago
galeselee / Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆278Updated 7 months ago
microsoft / SparTA
☆152Updated last year
InternLM / Awesome-LLM-Training-System
☆43Updated last year
zcli-charlie / Awesome-KV-Cache
☆79Updated last year
LiuXiaoxuanPKU / OSD
☆60Updated 10 months ago
kwai / Megatron-Kwai
[USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…
☆65Updated last year