intelligent-machine-learning / glake

GLake: optimizing GPU memory management and IO transmission.

☆375

Related projects ⓘ

Alternatives and complementary repositories for glake

LLMServe / DistServe
Disaggregated serving system for Large Language Models (LLMs).
☆348Updated 2 months ago
AlibabaPAI / llumnix
Efficient and easy multi-instance LLM serving
☆192Updated this week
alibaba / rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
☆541Updated 3 weeks ago
alibaba / EasyParallelLibrary
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
☆264Updated last year
pcg-mlp / KsanaLLM
☆282Updated last week
microsoft / vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
☆222Updated this week
FlagOpen / FlagGems
FlagGems is an operator library for large language models implemented in Triton Language.
☆328Updated this week
volcengine / veScale
A PyTorch Native LLM Training Framework
☆661Updated 2 months ago
hahnyuan / LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…
☆310Updated last month
OpenPPL / ppl.llm.serving
☆123Updated this week
alibaba / TePDist
TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.
☆90Updated last year
bytedance / flux
A fast communication-overlapping library for tensor parallelism on GPUs.
☆217Updated last week
microsoft / sarathi-serve
A low-latency & high-throughput serving engine for LLMs
☆231Updated last month
FlagOpen / FlagAttention
A collection of memory efficient attention operators implemented in the Triton language.
☆215Updated 5 months ago
bytedance / ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
☆457Updated 7 months ago
bytedance / ByteMLPerf
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…
☆200Updated 3 weeks ago
bytedance / byteir
A model compilation solution for various hardware
☆377Updated this week
FlagOpen / FlagScale
FlagScale is a large model toolkit based on open-sourced projects.
☆167Updated this week
Yinghan-Li / YHs_Sample
Yinghan's Code Sample
☆284Updated 2 years ago
sail-sg / zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
☆279Updated this week
OpenPPL / ppl.llm.kernel.cuda
☆136Updated this week
alibaba / BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
☆815Updated 2 months ago
OpenPPL / ppl.nn.llm
☆140Updated 6 months ago
lambda7xx / awesome-AI-system
paper and its code for AI System
☆210Updated 2 months ago
microsoft / mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
☆246Updated this week
galeselee / Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆166Updated this week
KnowingNothing / MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
☆287Updated last month
zw0610 / zw0610.github.io
☆55Updated 4 years ago
Bruce-Lee-LY / cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…
☆296Updated 2 months ago
microsoft / vidur
A large-scale simulation framework for LLM inference
☆271Updated 3 weeks ago