apuaaChen/EVT_AE

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/apuaaChen/EVT_AE)

apuaaChen / EVT_AE

Artifacts of EVT ASPLOS'24

☆29

Alternatives and similar repositories for EVT_AE

Users that are interested in EVT_AE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MoZeWei / moTuner
View on GitHub
☆10May 12, 2022Updated 4 years ago
CalebDu / Awesome-Cute
View on GitHub
☆121May 16, 2025Updated last year
UbiquitousLearning / Mandheling-DSP-Training
View on GitHub
The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]
☆20Aug 4, 2022Updated 3 years ago
humuyan / Korch
View on GitHub
ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch
☆41Mar 27, 2025Updated last year
ucb-bar / MoCA
View on GitHub
☆29Feb 26, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
summerspringwei / souffle-ae
View on GitHub
☆17Jan 24, 2024Updated 2 years ago
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
ColfaxResearch / cutlass-kernels
View on GitHub
☆270Jul 11, 2024Updated 2 years ago
WeiCheng14159 / bazel-android-opencl
View on GitHub
Run OpenCL program on MOBILE GPU (Qualcomm & ARM) !
☆18Jun 27, 2018Updated 8 years ago
GindaChen / FlexFlashAttention3
View on GitHub
FlexAttention w/ FlashAttention3 Support
☆27Oct 5, 2024Updated last year
xxcclong / GNN-Computing
View on GitHub
Artifact for PPoPP20 "Understanding and Bridging the Gaps in Current GNN Performance Optimizations"
☆42Nov 16, 2021Updated 4 years ago
SNU-ARC / OpenDNN
View on GitHub
OpenDNN: An Open-source, cuDNN-like Deep Learning Primitive Library
☆29Dec 9, 2019Updated 6 years ago
mingfeima / pytorch_profiler_parser
View on GitHub
parser script to process pytorch autograd profiler result, convert json file to excel.
☆15Oct 8, 2019Updated 6 years ago
TiledTensor / TiledLower
View on GitHub
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆13Nov 23, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
KnowingNothing / compiler-and-arch
View on GitHub
A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture
☆532Jan 15, 2025Updated last year
spcl / arrow-matrix
View on GitHub
Arrow Matrix Decomposition - Communication-Efficient Distributed Sparse Matrix Multiplication
☆15Mar 25, 2024Updated 2 years ago
SuperScientificSoftwareLaboratory / TileSpGEMM
View on GitHub
Source code of the PPoPP '22 paper: "TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs" by Y…
☆48May 22, 2024Updated 2 years ago
HPMLL / NVIDIA-Hopper-Benchmark
View on GitHub
☆116May 31, 2025Updated last year
reed-lau / cute-gemm
View on GitHub
☆188May 11, 2026Updated 2 months ago
HAWAIILAB / cuda-flux
View on GitHub
CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels
☆33Mar 15, 2021Updated 5 years ago
spcl / open-earth-compiler
View on GitHub
development repository for the open earth compiler
☆82Feb 19, 2021Updated 5 years ago
microsoft / triton-shared
View on GitHub
Shared Middle-Layer for Triton Compilation
☆340Dec 5, 2025Updated 7 months ago
temporal-hpc / reduction-tensor-cores
View on GitHub
Fast GPU based tensor core reductions
☆12Jan 13, 2023Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
microsoft / TileFusion
View on GitHub
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆115Jun 28, 2025Updated last year
szcompressor / FZ-GPU
View on GitHub
FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Data on GPUs
☆15Jun 21, 2026Updated last month
vortexgpgpu / Volt
View on GitHub
☆18Feb 9, 2026Updated 5 months ago
yaohuicai / smoothe-artifact
View on GitHub
☆18Aug 9, 2025Updated 11 months ago
mlsys-seo / ooo-backprop
View on GitHub
☆26Dec 5, 2022Updated 3 years ago
KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆34Jul 29, 2025Updated 11 months ago
RC4ML / LoHan
View on GitHub
A low-cost, high-performance deep learning training framework that enables efficient 100B-scale model fine-tuning on a commodity server w…
☆23Mar 21, 2025Updated last year
NVlabs / SOLAR
View on GitHub
Speed of Light Analysis for ML Model Runtime
☆108Jun 10, 2026Updated last month
YukeWang96 / GNNAdvisor_OSDI21
View on GitHub
Artifact for OSDI'21 GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs.
☆71Mar 2, 2023Updated 3 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
ahmedheakl / CASS
View on GitHub
[ACL 2026 🔥] CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark
☆35Apr 20, 2026Updated 3 months ago
TiledTensor / TiledKernel
View on GitHub
TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.
☆19May 12, 2024Updated 2 years ago
xyzsam / mallacc
View on GitHub
Mallacc: Accelerating Memory Allocation
☆13Jan 2, 2018Updated 8 years ago
UT-InfraAI / cuco
View on GitHub
An agent for CUDA compute-communication kernel co-design
☆35May 7, 2026Updated 2 months ago
gravins / Anti-SymmetricDGN
View on GitHub
Official code repository for the papers "Anti-Symmetric DGN: a stable architecture for Deep Graph Networks" accepted at ICLR 2023; "Non-D…
☆15Jan 2, 2025Updated last year
Xilinx / inference-server
View on GitHub
☆48Jun 18, 2024Updated 2 years ago
IST-DASLab / FP-Quant
View on GitHub
☆115Feb 26, 2026Updated 4 months ago