jiazhihao / attention_superoptimizerLinks

An Attention Superoptimizer

☆22

Alternatives and similar repositories for attention_superoptimizer

Users that are interested in attention_superoptimizer are comparing it to the libraries listed below

Sorting:

google / iopddl
Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning
☆23Updated 5 months ago
TiledTensor / TiledLower
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆14Updated 11 months ago
casys-kaist / EnvPipe
☆25Updated 2 years ago
bytedance / QSync
Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".
☆20Updated last year
SJTU-IPADS / disb
DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.
☆54Updated last year
msr-fiddle / harmony
☆17Updated 2 years ago
LeiWang1999 / Stream-k.tvm
☆19Updated last year
jasperzhong / swift
☆15Updated 3 years ago
zhisbug / Cavs
Cavs: An Efficient Runtime System for Dynamic Neural Networks
☆15Updated 5 years ago
illinois-impact / klap
A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches
☆15Updated 6 years ago
DachengLi1 / AMP
(NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.
☆41Updated 2 years ago
dywsjtu / apparate
Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]
☆25Updated 11 months ago
xiezhq-hermann / graphiler
Graphiler is a compiler stack built on top of DGL and TorchScript which compiles GNNs defined using user-defined functions (UDFs) into ef…
☆59Updated 3 years ago
microsoft / FractalTensor
FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …
☆29Updated 10 months ago
hku-systems / naspipe
☆14Updated 3 years ago
sjtu-epcc / Tacker
Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS
☆31Updated 8 months ago
zhuzilin / pytorch-malloc
An external memory allocator example for PyTorch.
☆16Updated 2 months ago
chips-compilers-mlsys-21 / chips-compilers-mlsys-21.github.io
☆11Updated 4 years ago
awslabs / ratex
☆23Updated 2 months ago
S-Lab-System-Group / Awesome-ML-for-System
SOTA Learning-augmented Systems
☆37Updated 3 years ago
apuaaChen / EVT_AE
Artifacts of EVT ASPLOS'24
☆27Updated last year
zhuohan123 / terapipe
☆75Updated 4 years ago
sjtu-epcc / DVABatch
☆21Updated 3 years ago
yuyangJin / PerFlow-AI
PerFlow-AI is a programmable performance analysis, modeling, prediction tool for AI system.
☆24Updated 2 weeks ago
YukeWang96 / MGG_OSDI23
Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Mult…
☆40Updated last year
chhzh123 / ptc-tutorial
PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo
☆17Updated 2 years ago
microsoft / SuperScaler
An experimental parallel training platform
☆54Updated last year
pku-liang / MAGIS
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
☆55Updated last year
SJTU-IPADS / reef-artifacts
A GPU-accelerated DNN inference serving system that supports instant kernel preemption and biased concurrent execution in GPU scheduling.
☆43Updated 3 years ago
ByteDance-Seed / StragglerAnalysis
☆42Updated 5 months ago