AlibabaPAI / torchaccLinks

PyTorch distributed training acceleration framework

☆51

Alternatives and similar repositories for torchacc

Users that are interested in torchacc are comparing it to the libraries listed below

Sorting:

infinigence / Semi-PD
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆104Updated 2 months ago
kwai / Megatron-Kwai
[USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…
☆61Updated last year
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆155Updated last month
AlibabaPAI / FLASHNN
☆96Updated 11 months ago
alibaba / TePDist
TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.
☆94Updated 2 years ago
madsys-dev / deepseekv2-profile
☆145Updated 5 months ago
InternLM / turbomind
☆92Updated 4 months ago
OpenPPL / ppl.llm.serving
☆128Updated 7 months ago
stepfun-ai / StepMesh
☆209Updated last week
FlagOpen / FlagCX
☆81Updated last week
bytedance / ByteMLPerf
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…
☆256Updated this week
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆114Updated this week
fanshiqing / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆134Updated 3 weeks ago
alibaba / easydist
Automated Parallelization System and Infrastructure for Multiple Ecosystems
☆79Updated 8 months ago
OpenPPL / ppl.llm.kernel.cuda
☆149Updated 6 months ago
Victarry / PP-Schedule-Visualization
Pipeline Parallelism Emulation and Visualization
☆54Updated last month
feifeibear / LLMRoofline
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
☆110Updated last year
microsoft / vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
☆405Updated 2 months ago
yifuwang / symm-mem-recipes
☆102Updated 7 months ago
microsoft / sarathi-serve
A low-latency & high-throughput serving engine for LLMs
☆400Updated 2 months ago
ppl-ai / pplx-kernels
Perplexity GPU Kernels
☆418Updated 3 weeks ago
CalebDu / Awesome-Cute
☆91Updated 2 months ago
AlibabaPAI / FlashModels
Fast and easy distributed model training examples.
☆13Updated 8 months ago
sgl-project / genai-bench
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆185Updated this week
ConnollyLeon / awesome-Auto-Parallelism
A baseline repository of Auto-Parallelism in Training Neural Networks
☆144Updated 3 years ago
Bruce-Lee-LY / flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
☆39Updated 5 months ago
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆129Updated 6 months ago
sail-sg / zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
☆415Updated 3 months ago
OpenPPL / ppl.nn.llm
☆139Updated last year
LoongServe / LoongServe
☆110Updated 8 months ago