ml-energy / leaderboardLinks

A canonical source of GenAI energy benchmark and meausrements

☆50

Alternatives and similar repositories for leaderboard

Users that are interested in leaderboard are comparing it to the libraries listed below

Sorting:

SymbioticLab / Oobleck
A resilient distributed training framework
☆96Updated last year
zhengzangw / Sequence-Scheduling
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
☆93Updated 2 years ago
ml-energy / zeus
Measure and optimize the energy consumption of your AI applications!
☆307Updated last week
casys-kaist / EnvPipe
☆25Updated 2 years ago
project-etalon / etalon
LLM Serving Performance Evaluation Harness
☆80Updated 8 months ago
Hsword / SpotServe
SpotServe: Serving Generative Large Language Models on Preemptible Instances
☆132Updated last year
tyler-griggs / melange-release
☆48Updated last year
dywsjtu / apparate
Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]
☆25Updated last year
DerrickYLJ / TidalDecode
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆48Updated 3 months ago
tanyuqian / redco
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
☆68Updated 11 months ago
UChi-JCL / CacheGen
☆140Updated last year
hao-ai-lab / MuxServe
☆79Updated last month
hao-ai-lab / vllm-ltr
[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank
☆64Updated last year
microsoft / ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆191Updated last year
WukLab / preble
Stateful LLM Serving
☆88Updated 8 months ago
DataStates / datastates-llm
LLM checkpointing for DeepSpeed/Megatron
☆21Updated last month
alibaba / ServeGen
A framework for generating realistic LLM serving workloads
☆79Updated last month
facebookresearch / ACT
ACT An Architectural Carbon Modeling Tool for Designing Sustainable Computer Systems
☆43Updated 4 months ago
DS3Lab / DT-FM
☆93Updated 3 years ago
DachengLi1 / AMP
(NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.
☆43Updated 3 years ago
thu-pacman / FasterMoE
☆88Updated 3 years ago
Infini-AI-Lab / MagicDec
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆131Updated 11 months ago
S-Lab-System-Group / Hydro
Surrogate-based Hyperparameter Tuning System
☆27Updated 2 years ago
Ying1123 / VTC-artifact
☆40Updated last year
timlee0212 / SiDA-MoE
Code for MLSys 2024 Paper "SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models"
☆21Updated last year
cornserve-ai / cornserve
Easy, Fast, and Scalable Multimodal AI
☆47Updated this week
HPMLL / BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆220Updated 3 months ago
tonyzhao-jt / LLM-PQ
Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …
☆35Updated 2 months ago
opengear-project / GEAR
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
☆170Updated last year
yale-sys / prompt-cache
Modular and structured prompt caching for low-latency LLM inference
☆102Updated last year