BBuf / gpu-glossary-zhLinks

https://bbuf.github.io/gpu-glossary-zh/

☆21

Alternatives and similar repositories for gpu-glossary-zh

Users that are interested in gpu-glossary-zh are comparing it to the libraries listed below

Sorting:

abcdabcd987 / libfabric-efa-demo
☆71Updated 10 months ago
antgroup / DeepXTrace
DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.
☆67Updated 2 weeks ago
DeepLink-org / DLSlime
DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit
☆82Updated this week
infinigence / Semi-PD
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆115Updated 6 months ago
KuangjuX / NVSHMEM-Tutorial
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆143Updated 2 months ago
WukLab / preble
Stateful LLM Serving
☆88Updated 8 months ago
apache / tvm-ffi
Open ABI and FFI for Machine Learning Systems
☆174Updated last week
shenh10 / DeepSeek_Simulator
☆90Updated 7 months ago
Multi-LLM / prism-research
Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.
☆46Updated 3 months ago
alibaba / llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆63Updated last year
Hsword / Awesome-Machine-Learning-System-Papers
☆79Updated 3 years ago
quiver-team / quiver-feature
High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph
☆55Updated 3 years ago
yuyangJin / PerFlow-AI
PerFlow-AI is a programmable performance analysis, modeling, prediction tool for AI system.
☆24Updated last week
kvcache-ai / TrEnv-X
☆67Updated 2 months ago
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆187Updated last month
NEO-MLSys25 / NEO
NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading
☆69Updated 5 months ago
ByteDance-Seed / StragglerAnalysis
☆43Updated 6 months ago
Azure / msccl-executor-nccl
☆47Updated 11 months ago
gty111 / gLLM
gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling
☆49Updated this week
SJTU-IPADS / PhoenixOS
Fast OS-level support for GPU checkpoint and restore
☆257Updated last month
bytedance / InfiniStore
KV cache store for distributed LLM inference
☆361Updated last week
zartbot / shallowsim
DeepSeek-V3/R1 inference performance simulator
☆168Updated 7 months ago
ademeure / DeeperGEMM
DeeperGEMM: crazy optimized version
☆73Updated 6 months ago
SJTU-IPADS / reef
REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…
☆103Updated 2 years ago
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆142Updated 10 months ago
sonnyli / flash_attention_from_scratch
Flash Attention from Scratch on CUDA Ampere
☆63Updated 2 months ago
sunkx109 / GPUs-Specs
Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM
☆64Updated 3 months ago
lipracer / cuda-rt-hook
☆43Updated 4 months ago
stepfun-ai / StepMesh
☆316Updated last week
S-Lab-System-Group / Awesome-ML-for-System
SOTA Learning-augmented Systems
☆37Updated 3 years ago