BBuf / gpu-glossary-zhLinks
https://bbuf.github.io/gpu-glossary-zh/
☆21Updated 2 weeks ago
Alternatives and similar repositories for gpu-glossary-zh
Users that are interested in gpu-glossary-zh are comparing it to the libraries listed below
Sorting:
- ☆71Updated 10 months ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆67Updated 2 weeks ago
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆82Updated this week
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆115Updated 6 months ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆143Updated 2 months ago
- Stateful LLM Serving☆88Updated 8 months ago
- Open ABI and FFI for Machine Learning Systems☆174Updated last week
- ☆90Updated 7 months ago
- Research prototype of PRISM — a cost-efficient multi-LLM serving system with flexible time- and space-based GPU sharing.☆46Updated 3 months ago
- Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“☆63Updated last year
- ☆79Updated 3 years ago
- High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph☆55Updated 3 years ago
- PerFlow-AI is a programmable performance analysis, modeling, prediction tool for AI system.☆24Updated last week
- ☆67Updated 2 months ago
- A lightweight design for computation-communication overlap.☆187Updated last month
- NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading☆69Updated 5 months ago
- ☆43Updated 6 months ago
- ☆47Updated 11 months ago
- gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling☆49Updated this week
- Fast OS-level support for GPU checkpoint and restore☆257Updated last month
- KV cache store for distributed LLM inference☆361Updated last week
- DeepSeek-V3/R1 inference performance simulator☆168Updated 7 months ago
- DeeperGEMM: crazy optimized version☆73Updated 6 months ago
- REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…☆103Updated 2 years ago
- High performance Transformer implementation in C++.☆142Updated 10 months ago
- Flash Attention from Scratch on CUDA Ampere☆63Updated 2 months ago
- Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM☆64Updated 3 months ago
- ☆43Updated 4 months ago
- ☆316Updated last week
- SOTA Learning-augmented Systems☆37Updated 3 years ago