bytedance / QSyncLinks

Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".

☆20

Alternatives and similar repositories for QSync

Users that are interested in QSync are comparing it to the libraries listed below

Sorting:

zhuzilin / pytorch-malloc
An external memory allocator example for PyTorch.
☆14Updated 3 years ago
TiledTensor / TiledLower
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆14Updated 7 months ago
sjtu-epcc / Tacker
Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS
☆27Updated 5 months ago
casys-kaist / EnvPipe
☆25Updated last year
dywsjtu / apparate
Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]
☆25Updated 7 months ago
SJTU-IPADS / disb
DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.
☆52Updated 10 months ago
jiazhihao / attention_superoptimizer
An Attention Superoptimizer
☆22Updated 5 months ago
google / iopddl
Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning
☆23Updated 2 months ago
chhzh123 / ptc-tutorial
PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo
☆18Updated 2 years ago
SymbioticLab / ModelKeeper
A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup
☆35Updated 2 years ago
S-Lab-System-Group / Awesome-ML-for-System
SOTA Learning-augmented Systems
☆36Updated 3 years ago
LeiWang1999 / Stream-k.tvm
☆19Updated 9 months ago
microsoft / tokenweave
Efficient Compute-Communication Overlap for Distributed LLM Inference
☆22Updated 2 weeks ago
apuaaChen / EVT_AE
Artifacts of EVT ASPLOS'24
☆26Updated last year
microsoft / FractalTensor
FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …
☆27Updated 6 months ago
sjtu-epcc / DVABatch
☆19Updated 3 years ago
lzhangbv / dear_pytorch
[ICDCS 2023] DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining
☆12Updated last year
awslabs / optimizing-multitask-training-through-dynamic-pipelines
Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
☆19Updated last year
Infrawaves / DeepEP_ibrc_dual-ports_multiQP
Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport
☆53Updated 2 months ago
zhuangwang93 / Cupcake
Cupcake: A Compression Scheduler for Scalable Communication-Efficient Distributed Training (MLSys '23)
☆9Updated 2 years ago
hku-systems / naspipe
☆14Updated 3 years ago
HPMLL / NVIDIA-Hopper-Benchmark
☆49Updated last month
strongh2 / sc22-ae
☆13Updated 2 years ago
humuyan / Korch
ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch
☆37Updated 3 months ago
msr-fiddle / harmony
☆16Updated 2 years ago
xxyux / SpInfer
SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
☆50Updated 3 months ago
zhisbug / Cavs
Cavs: An Efficient Runtime System for Dynamic Neural Networks
☆14Updated 4 years ago
zheng-ningxin / SparTA
☆9Updated last year
LeiWang1999 / AutoGPTQ.tvm
GPTQ inference TVM kernel
☆40Updated last year
yuyangJin / PerFlow-AI
PerFlow-AI is a programmable performance analysis, modeling, prediction tool for AI system.
☆20Updated 2 months ago