sunkx109 / GPUs-SpecsLinks

Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM

☆67

Alternatives and similar repositories for GPUs-Specs

Users that are interested in GPUs-Specs are comparing it to the libraries listed below

Sorting:

zartbot / shallowsim
DeepSeek-V3/R1 inference performance simulator
☆169Updated 8 months ago
shenh10 / DeepSeek_Simulator
☆90Updated 8 months ago
chenhongyu2048 / LLM-inference-optimization-paper
Summary of some awesome work for optimizing LLM inference
☆139Updated this week
infinigence / Semi-PD
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆118Updated 6 months ago
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆190Updated last month
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆142Updated 10 months ago
KuangjuX / NVSHMEM-Tutorial
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆144Updated 2 months ago
galeselee / Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆282Updated 9 months ago
ConnollyLeon / awesome-Auto-Parallelism
A baseline repository of Auto-Parallelism in Training Neural Networks
☆147Updated 3 years ago
DD-DuDa / Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
☆254Updated 5 months ago
sonnyli / flash_attention_from_scratch
Flash Attention from Scratch on CUDA Ampere
☆76Updated 3 months ago
CalebDu / Awesome-Cute
☆112Updated 6 months ago
feifeibear / LLMRoofline
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
☆119Updated last year
gty111 / gLLM
gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling
☆51Updated this week
ArthurinRUC / cutlass-notes
From Minimal GEMM to Everything
☆82Updated 3 weeks ago
toyaix / triton-runner
Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.
☆76Updated last week
DeepLink-org / DLSlime
DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit
☆82Updated this week
apache / tvm-ffi
Open ABI and FFI for Machine Learning Systems
☆211Updated this week
tile-ai / tilescale
Tile-based language built for AI computation across all scales
☆82Updated this week
AlibabaResearch / mononn
☆32Updated last year
zhaiyi000 / tlm
☆45Updated last year
alibaba / llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆64Updated last year
OpenPPL / ppl.llm.kernel.cuda
☆152Updated 10 months ago
interestingLSY / swiftLLM
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …
☆292Updated 5 months ago
TiledTensor / TiledCUDA
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆190Updated 10 months ago
mutinifni / splitwise-sim
LLM serving cluster simulator
☆122Updated last year
AlibabaPAI / FLASHNN
☆102Updated last year
heheda12345 / MagPy
☆40Updated last year
kwai / Megatron-Kwai
LLM training technologies developed by kwai
☆66Updated last week
harleyszhang / llm_counts
llm theoretical performance analysis tools and support params, flops, memory and latency analysis.
☆113Updated 4 months ago