PKU-DAIR / HetuLinks

A high-performance distributed deep learning system targeting large-scale and automated distributed training.

☆328

Alternatives and similar repositories for Hetu

Users that are interested in Hetu are comparing it to the libraries listed below

Sorting:

Hsword / Hetu
A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, …
☆123Updated last year
ConnollyLeon / awesome-Auto-Parallelism
A baseline repository of Auto-Parallelism in Training Neural Networks
☆147Updated 3 years ago
lambda7xx / awesome-AI-system
paper and its code for AI System
☆339Updated 3 months ago
galeselee / Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆282Updated 8 months ago
kwai / Megatron-Kwai
LLM training technologies developed by kwai
☆66Updated last week
Jack47 / hack-SysML
The road to hack SysML and become an system expert
☆500Updated last year
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆120Updated 2 months ago
thu-pacman / SmartMoE-AE
ATC23 AE
☆47Updated 2 years ago
HPDL-Group / Merak
☆81Updated 6 months ago
Shenggan / awesome-distributed-ml
A curated list of awesome projects and papers for distributed training or inference
☆253Updated last year
LLMServe / DistServe
Disaggregated serving system for Large Language Models (LLMs).
☆737Updated 7 months ago
alpa-projects / mms
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆91Updated 2 years ago
cli99 / llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
☆466Updated 7 months ago
sail-sg / zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
☆437Updated 6 months ago
alibaba / EasyParallelLibrary
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
☆270Updated 2 years ago
LoongServe / LoongServe
☆124Updated last year
thu-pacman / FasterMoE
☆88Updated 3 years ago
alibaba / TePDist
TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.
☆98Updated 2 years ago
HPMLL / BurstGPT
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
☆220Updated 4 months ago
DicardoX / Research-Space
This repository is established to store personal notes and annotated papers during daily research.
☆165Updated this week
AmadeusChan / Awesome-LLM-System-Papers
☆610Updated 6 months ago
TreeAI-Lab / Awesome-KV-Cache-Management
This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding co…
☆255Updated 4 months ago
AlibabaPAI / DAPPLE
An Efficient Pipelined Data Parallel Approach for Training Large Model
☆76Updated 4 years ago
antgroup / glake
GLake: optimizing GPU memory management and IO transmission.
☆490Updated 8 months ago
zhuohan123 / terapipe
☆77Updated 4 years ago
Relaxed-System-Lab / HexGen
[ICML 2024] Serving LLMs on heterogeneous decentralized clusters.
☆31Updated last year
stepfun-ai / StepMesh
☆324Updated 3 weeks ago
AlibabaResearch / flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆224Updated 2 years ago
microsoft / vidur
A large-scale simulation framework for LLM inference
☆488Updated 4 months ago
madsys-dev / deepseekv2-profile
☆152Updated 9 months ago