strongh2 / sc22-aeLinks

☆13

Alternatives and similar repositories for sc22-ae

Users that are interested in sc22-ae are comparing it to the libraries listed below

Sorting:

chhzh123 / ptc-tutorial
PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo
☆18Updated 2 years ago
hku-systems / naspipe
☆14Updated 3 years ago
illinois-impact / klap
A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches
☆15Updated 6 years ago
AIS-SNU / Optimus-CC
[ASPLOS'23] Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
☆6Updated 11 months ago
jiazhihao / attention_superoptimizer
An Attention Superoptimizer
☆22Updated 6 months ago
sarchlab / triosim
☆26Updated last month
Jiacheng / honeycomb-osdi23-ae
☆16Updated 2 years ago
RC4ML / RPCNIC
RPCNIC: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator [HPCA2025]
☆11Updated 7 months ago
ParCIS / Ok-Topk
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…
☆26Updated 2 years ago
TiledTensor / TiledLower
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆14Updated 8 months ago
PKUZHOU / PetS-ATC-2022
☆10Updated last year
casys-kaist / EnvPipe
☆25Updated last year
VITA-Group / Q-Hitter
☆14Updated last year
bytedance / QSync
Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".
☆20Updated last year
SwarmArch / T4
Code released to accompany the ISCA paper: "T4: Compiling Sequential Code for Effective Speculative Parallelization in Hardware"
☆29Updated 3 years ago
dywsjtu / apparate
Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]
☆25Updated 8 months ago
sjtu-epcc / DVABatch
☆20Updated 3 years ago
SJTU-IPADS / disb
DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.
☆53Updated 11 months ago
jasperzhong / swift
☆15Updated 3 years ago
he-actlab / polymath
☆21Updated 5 months ago
Froot-NetSys / Arya
Arya: Arbitrary Graph Pattern Mining with Decomposition-based Sampling
☆13Updated last year
qgwang-hust / GraSU
A Fast Graph Update Library for FPGA-based Dynamic Graph Processing
☆9Updated 3 years ago
microsoft / tokenweave
Efficient Compute-Communication Overlap for Distributed LLM Inference
☆26Updated last month
google / iopddl
Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning
☆23Updated 2 months ago
SJTU-IPADS / PipeLLM
☆19Updated 7 months ago
pku-liang / MAGIS
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
☆53Updated last year
ceruleangu / Block-Sparse-Benchmark
Benchmark for matrix multiplications between dense and block sparse (BSR) matrix in TVM, blocksparse (Gray et al.) and cuSparse.
☆24Updated 4 years ago
PKUZHOU / NeoMem-MICRO-2024
The Artifact of NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering
☆54Updated 11 months ago
Linestro / GRACE
Artifact of ASPLOS'23 paper entitled: GRACE: A Scalable Graph-Based Approach to Accelerating Recommendation Model Inference
☆19Updated 2 years ago
SNU-ARC / flashneuron
☆39Updated 2 years ago