Ther-nullptr / Awesome-Transformer-AcclerationLinks

Paper list for accleration of transformers

☆13

Alternatives and similar repositories for Awesome-Transformer-Accleration

Users that are interested in Awesome-Transformer-Accleration are comparing it to the libraries listed below

Sorting:

zyqCSL / DiffKV
☆28Updated last month
chhzh123 / ptc-tutorial
PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo
☆17Updated 2 years ago
xshaun / sc22-ae
☆14Updated last week
pku-liang / MAGIS
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
☆55Updated last year
sjtu-epcc / DVABatch
☆21Updated 3 years ago
leesou / PIM-DL-ASPLOS
PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization
☆33Updated last year
ucb-bar / dosa
DOSA: Differentiable Model-Based One-Loop Search for DNN Accelerators
☆18Updated last year
NaelF / BinaryCoP
Binary Neural Network-based COVID-19 Face-Mask Wear and Positioning Predictor on Edge Devices
☆12Updated 4 years ago
SJTU-IPADS / disb
DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.
☆54Updated last year
sjtu-epcc / Tacker
Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS
☆32Updated 9 months ago
TiledTensor / TiledLower
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆14Updated 11 months ago
pku-liang / TileFlow
TileFlow is a performance analysis tool based on Timeloop for fusion dataflows
☆62Updated last year
union-codesign / union
☆14Updated 4 years ago
stepbuystep / LightNAS
You Only Search Once: On Lightweight Differentiable Architecture Search for Resource-Constrained Embedded Platforms
☆11Updated 2 years ago
HPCRL / ASPLOS_artifact
☆13Updated 4 years ago
hku-systems / naspipe
☆14Updated 3 years ago
tsinghua-ideal / Canvas
Canvas: End-to-End Kernel Architecture Search in Neural Networks
☆26Updated 11 months ago
uiuc-arc / felix
Optimize tensor program fast with Felix, a gradient descent autotuner.
☆28Updated last year
BoyuanFeng / APNN-TC
☆19Updated 4 years ago
uwsampl / sparsetir-artifact
Repository for artifact evaluation of ASPLOS 2023 paper "SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning"
☆26Updated 2 years ago
bytedance / QSync
Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".
☆20Updated last year
lixiuhong / batched_gemm
☆39Updated 5 years ago
microsoft / FractalTensor
FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …
☆29Updated 10 months ago
humuyan / Korch
ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch
☆38Updated 7 months ago
pku-liang / HASCO
agile hardware-software co-design
☆52Updated 3 years ago
UofT-EcoSystem / hfta
Boost hardware utilization for ML training workloads via Inter-model Horizontal Fusion
☆32Updated last year
PolyArch / dsagen2
Domain-Specific Architecture Generator 2
☆21Updated 3 years ago
ParCIS / Ok-Topk
Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…
☆27Updated 2 years ago
google / iopddl
Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning
☆23Updated 6 months ago
hgyhungry / alcop-artifact
☆23Updated 2 years ago