pku-liang/MAGIS

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/pku-liang/MAGIS)

pku-liang / MAGIS

MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)

☆57

Alternatives and similar repositories for MAGIS

Users that are interested in MAGIS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mlc-ai / mlc-python
View on GitHub
☆36Jul 19, 2025Updated last year
pku-liang / TileFlow
View on GitHub
TileFlow is a performance analysis tool based on Timeloop for fusion dataflows
☆72Apr 12, 2024Updated 2 years ago
pku-liang / Hestia
View on GitHub
☆17Mar 26, 2025Updated last year
pku-liang / ArkVale
View on GitHub
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)
☆54Dec 17, 2024Updated last year
zhaiyi000 / tlm
View on GitHub
☆49Jul 13, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
tsinghua-ideal / Syno
View on GitHub
Source code repository for ASPLOS '25 paper "Syno: Structured Synthesis for Neural Operators"
☆15Aug 31, 2025Updated 10 months ago
nox-410 / tvm.tl
View on GitHub
An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.
☆52Jul 23, 2024Updated 2 years ago
yuanxinnn / APTMoE
View on GitHub
☆13Jun 29, 2024Updated 2 years ago
xinhao-luo / ClusterFusion
View on GitHub
[NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
☆75Dec 11, 2025Updated 7 months ago
AlibabaResearch / mononn
View on GitHub
☆32Jul 17, 2024Updated 2 years ago
KnowingNothing / MatmulTutorial
View on GitHub
A Easy-to-understand TensorOp Matmul Tutorial
☆446Mar 5, 2026Updated 4 months ago
TiledTensor / TiledLower
View on GitHub
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆13Nov 23, 2024Updated last year
pku-liang / AMOS
View on GitHub
Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators
☆125Oct 26, 2022Updated 3 years ago
awslabs / optimizing-multitask-training-through-dynamic-pipelines
View on GitHub
Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
☆19Dec 8, 2023Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
HPCRL / ASPLOS_artifact
View on GitHub
☆13Nov 1, 2021Updated 4 years ago
KnowingNothing / compiler-and-arch
View on GitHub
A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture
☆533Jan 15, 2025Updated last year
uiuc-arc / felix
View on GitHub
Optimize tensor program fast with Felix, a gradient descent autotuner.
☆33Mar 5, 2026Updated 4 months ago
pku-liang / Cement
View on GitHub
The Next-gen Language & Compiler Powering Efficient Hardware Design
☆39Jan 16, 2025Updated last year
microsoft / ParrotServe
View on GitHub
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆223Sep 21, 2024Updated last year
monellz / FlashTensor
View on GitHub
☆19Mar 4, 2025Updated last year
AlibabaResearch / flash-llm
View on GitHub
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆246Sep 24, 2023Updated 2 years ago
microsoft / BitBLAS
View on GitHub
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
☆769Aug 6, 2025Updated 11 months ago
summerspringwei / souffle-ae
View on GitHub
☆17Jan 24, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Raphael-Hao / brainstorm
View on GitHub
Compiler for Dynamic Neural Networks
☆45Nov 13, 2023Updated 2 years ago
sjtu-epcc / Tacker
View on GitHub
Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS
☆33Feb 10, 2025Updated last year
cowanmeg / cgo-artifact-2020
View on GitHub
Artifact repository for paper Automatic Generation of High-Performance Quantized Machine Learning Kernels
☆17Oct 13, 2020Updated 5 years ago
stganser / polyite
View on GitHub
Polyite: Iterative Schedule Optimization for Parallelization in the Polyhedron Model
☆12Jan 19, 2020Updated 6 years ago
weiya711 / sam
View on GitHub
☆18Oct 17, 2025Updated 9 months ago
oliverYoung2001 / UltraAttn
View on GitHub
SC'25 UltraAttn: Efficiently Parallelizing Attention through Hierarchical Context-Tiling
☆16Aug 14, 2025Updated 11 months ago
mustard-seed / SparseDNNAccelerator
View on GitHub
Sparse CNN Accelerator targeting Intel FPGA
☆15Aug 26, 2021Updated 4 years ago
TiledTensor / TiledCUDA
View on GitHub
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆192Jan 28, 2025Updated last year
BradMcDanel / column-combine
View on GitHub
☆27Apr 28, 2020Updated 6 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
tlc-pack / relax
View on GitHub
☆193Mar 28, 2023Updated 3 years ago
uwsampl / paper-agents
View on GitHub
☆13Dec 9, 2024Updated last year
microsoft / ark
View on GitHub
A GPU-driven system framework for scalable AI applications
☆130Jul 15, 2026Updated 2 weeks ago
zejia-lin / BulletServe
View on GitHub
Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration
☆53Jan 8, 2026Updated 6 months ago
illinois-impact / klap
View on GitHub
A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches
☆15Jun 21, 2019Updated 7 years ago
acsl-technion / TPT
View on GitHub
☆15May 23, 2023Updated 3 years ago
microsoft / TileFusion
View on GitHub
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆115Jun 28, 2025Updated last year