hku-systems / naspipeLinks

☆14

Alternatives and similar repositories for naspipe

Users that are interested in naspipe are comparing it to the libraries listed below

Sorting:

casys-kaist / EnvPipe
☆25Updated last year
sjtu-epcc / DVABatch
☆20Updated 3 years ago
SJTU-IPADS / disb
DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.
☆53Updated 11 months ago
SymbioticLab / ModelKeeper
A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup
☆35Updated 2 years ago
Raphael-Hao / Abacus
☆37Updated last month
ParCIS / Ok-Topk
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…
☆26Updated 2 years ago
sjtu-epcc / Tacker
Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS
☆31Updated 5 months ago
YukeWang96 / MGG_OSDI23
Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Mult…
☆40Updated last year
casys-kaist / HUVM
☆24Updated 2 years ago
UofT-EcoSystem / hfta
Boost hardware utilization for ML training workloads via Inter-model Horizontal Fusion
☆32Updated last year
Raphael-Hao / brainstorm
Compiler for Dynamic Neural Networks
☆46Updated last year
SJTU-IPADS / reef-artifacts
A GPU-accelerated DNN inference serving system that supports instant kernel preemption and biased concurrent execution in GPU scheduling.
☆42Updated 3 years ago
xiezhq-hermann / graphiler
Graphiler is a compiler stack built on top of DGL and TorchScript which compiles GNNs defined using user-defined functions (UDFs) into ef…
☆59Updated 2 years ago
google / iopddl
Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning
☆23Updated 2 months ago
YukeWang96 / GNNAdvisor_OSDI21
Artifact for OSDI'21 GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs.
☆66Updated 2 years ago
pku-liang / MAGIS
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
☆53Updated last year
S-Lab-System-Group / Awesome-ML-for-System
SOTA Learning-augmented Systems
☆36Updated 3 years ago
dywsjtu / apparate
Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]
☆25Updated 8 months ago
UMass-LIDS / Proteus
Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling
☆13Updated last year
alpa-projects / mms
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)
☆83Updated 2 years ago
mutinifni / splitwise-sim
LLM serving cluster simulator
☆108Updated last year
msr-fiddle / CheckFreq
☆55Updated 4 years ago
zhisbug / Cavs
Cavs: An Efficient Runtime System for Dynamic Neural Networks
☆14Updated 4 years ago
jiazhihao / attention_superoptimizer
An Attention Superoptimizer
☆22Updated 6 months ago
YukeWang96 / TC-GNN_ATC23
Artifact for USENIX ATC'23: TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs.
☆49Updated last year
msr-fiddle / dnn-partitioning
☆40Updated 4 years ago
strongh2 / sc22-ae
☆13Updated 3 years ago
Soroosh129 / NeuOS
Source code for the paper: "A Latency-Predictable Multi-Dimensional Optimization Framework forDNN-driven Autonomous Systems"
☆22Updated 4 years ago
msr-fiddle / synergy
☆51Updated 2 years ago
parasailteam / coconet
☆80Updated 2 years ago