llmsystem / llmsystem2024spring

CMU 11868 Large Language Model Systems Spring 2024

☆12

Related projects: ⓘ

Hsword / Awesome-Machine-Learning-System-Papers
☆49Updated 2 years ago
HPMLL / BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆110Updated last month
lambda7xx / awesome-AI-system
paper and its code for AI System
☆202Updated 3 weeks ago
snu-comparch / InfiniGen
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
☆43Updated 2 months ago
Hsword / SpotServe
SpotServe: Serving Generative Large Language Models on Preemptible Instances
☆92Updated 6 months ago
galeselee / Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆153Updated this week
LoongServe / LoongServe
☆15Updated this week
uclasystem / bamboo
Bamboo is a system for running large pipeline-parallel DNNs affordably, reliably, and efficiently using spot instances.
☆46Updated last year
WukLab / InferCept
☆15Updated 2 months ago
DicardoX / Research-Space
This repository is established to store personal notes and annotated papers during daily research.
☆78Updated last week
AlibabaPAI / llumnix
Efficient and easy multi-instance LLM serving
☆119Updated last week
microsoft / ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆89Updated last week
Raphael-Hao / brainstorm
Compiler for Dynamic Neural Networks
☆42Updated 10 months ago
byungsoo-oh / ml-systems-papers
Curated collection of papers in machine learning systems
☆123Updated last month
mental2008 / awesome-papers
Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and o…
☆38Updated last month
SJTU-IPADS / ugache
☆20Updated 10 months ago
alpa-projects / mms
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆76Updated last year
raywan-110 / AdaQP
Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training
☆18Updated 6 months ago
msr-fiddle / CheckFreq
☆48Updated 3 years ago
LiuXiaoxuanPKU / Cost-Model-papers
☆11Updated last year
SJTU-IPADS / reef
REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…
☆84Updated last year
LiuXiaoxuanPKU / OSD
☆29Updated last month
adsl-rg / adsl-rg.github.io
☆8Updated this week
thustorage / PetPS
PetPS: Supporting Huge Embedding Models with Tiered Memory
☆28Updated 3 months ago
Chen-Binghao / PilotFish
PilotFish harvests the free GPU cycles of cloud gaming with deep learning training
☆13Updated 2 years ago
SJTU-IPADS / fgnn-artifacts
FGNN's artifact evaluation (EuroSys 2022)
☆17Updated 2 years ago
interestingLSY / swiftLLM
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …
☆84Updated 2 months ago
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆67Updated last week
LLMServe / dLoRA-artifact
☆12Updated 3 months ago
SymbioticLab / Oobleck
A resilient distributed training framework
☆78Updated 5 months ago