yuyangJin/PerFlow-AI

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yuyangJin/PerFlow-AI)

yuyangJin / PerFlow-AI

PerFlow-AI is a programmable performance analysis, modeling, prediction tool for AI system.

☆29

Alternatives and similar repositories for PerFlow-AI

Users that are interested in PerFlow-AI are comparing it to the libraries listed below

Sorting:

yuyangJin / PerFlow
View on GitHub
Domain-specific framework for performance analysis of parallel programs
☆16Feb 11, 2026Updated 3 weeks ago
thu-pacman / lab-guide
View on GitHub
Everything about PACMAN!
☆14Dec 18, 2025Updated 2 months ago
JohndeVostok / APE
View on GitHub
A GPU FP32 computation method with Tensor Cores.
☆26Dec 8, 2025Updated 2 months ago
google / iopddl
View on GitHub
Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning
☆25May 12, 2025Updated 9 months ago
foundation-model-stack / vllm-triton-backend
View on GitHub
A Triton-only attention backend for vLLM
☆24Feb 11, 2026Updated 3 weeks ago
thu-pacman / VAPRO
View on GitHub
Light-weight Performance Variance Detection for Production-run Parallel Applications
☆16Aug 28, 2023Updated 2 years ago
heheda12345 / MagPy
View on GitHub
☆41Jun 5, 2024Updated last year
stanford-sysml-seminar / stanford-sysml-seminar.github.io
View on GitHub
Website for Stanford SysML Seminar
☆17Oct 27, 2025Updated 4 months ago
aws-neuron / nki-library
View on GitHub
☆44Updated this week
roastduck / FreeTensor
View on GitHub
A language and compiler for irregular tensor programs.
☆152Nov 29, 2024Updated last year
schism-dev / RiverMeshTools
View on GitHub
Python tools for meshing rivers
☆12Oct 2, 2025Updated 5 months ago
NYCU-AI-EDA / Netlistify
View on GitHub
☆28Dec 3, 2025Updated 3 months ago
flashinfer-ai / flashinfer-bench
View on GitHub
Building the Virtuous Cycle for AI-driven LLM Systems
☆192Updated this week
catqaq / NLP-Notes
View on GitHub
详细双语注释版word2vec源码，well-annotated word2vec
☆10Oct 3, 2021Updated 4 years ago
thu-pacman / RisGraph
View on GitHub
RisGraph: A Real-Time Streaming System for Evolving Graphs to Support Sub-millisecond Per-update Analysis at Millions Ops/s
☆37May 11, 2022Updated 3 years ago
flashserve / PAT
View on GitHub
Prefix-Aware Attention for LLM Decoding
☆29Jan 23, 2026Updated last month
NVIDIA / ib-traffic-monitor
View on GitHub
A TUI-based utility for real-time monitoring of InfiniBand traffic and performance metrics on the local node
☆63Dec 19, 2025Updated 2 months ago
UofT-EcoSystem / hfta
View on GitHub
Boost hardware utilization for ML training workloads via Inter-model Horizontal Fusion
☆32May 15, 2024Updated last year
thustorage / PetPS
View on GitHub
PetPS: Supporting Huge Embedding Models with Tiered Memory
☆33May 21, 2024Updated last year
kvcache-ai / TrEnv-X
View on GitHub
☆74Sep 15, 2025Updated 5 months ago
adulau / netbeacon
View on GitHub
netbeacon - monitoring your network capture, NIDS or network analysis process
☆19Oct 26, 2013Updated 12 years ago
MeshInfra / WaferLLM
View on GitHub
WaferLLM: Large Language Model Inference at Wafer Scale
☆90Jan 7, 2026Updated last month
zhangjiong724 / autoassist-exp
View on GitHub
Code accompanying the NeurIPS 2019 paper AutoAssist: A Framework to Accelerate Training of Deep Neural Networks.
☆14Oct 3, 2022Updated 3 years ago
lzhangbv / acpsgd
View on GitHub
[ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning
☆10Apr 28, 2023Updated 2 years ago
Celeste-cj / video_toolkits
View on GitHub
☆10Sep 4, 2021Updated 4 years ago
sigopt / sigoptlite
View on GitHub
Optimize with SigOpt with this standalone SigOpt client driver.
☆12Feb 23, 2026Updated last week
thu-pacman / GraphPi
View on GitHub
☆42Oct 11, 2021Updated 4 years ago
datenlord / etcd-client
View on GitHub
☆15Jul 18, 2023Updated 2 years ago
CompML / survey-deep-gradient-compression
View on GitHub
☆10Jun 4, 2021Updated 4 years ago
tgangwani / RegAlloc
View on GitHub
Chaitin-Briggs register-allocation algorithm (LLVM back-end)
☆12Jan 6, 2016Updated 10 years ago
wangshicheng1225 / LoRDMA
View on GitHub
☆11Oct 21, 2023Updated 2 years ago
CSU-NetLab / A2TP-Eurosys2023
View on GitHub
☆11Mar 13, 2023Updated 2 years ago
AMDResearch / intelliperf
View on GitHub
Automated bottleneck detection and solution orchestration
☆19Feb 24, 2026Updated last week
zhongxinghong / Java-Jieba
View on GitHub
Jieba 0.39 的 Java 复刻版，支持原版 Jieba 的所有核心功能
☆12Feb 14, 2019Updated 7 years ago
litonglab / blender-neighbor-discovery
View on GitHub
Implementation of the BLE neighbor discovery simulation framework in paper "Blender: Toward Practical Simulation Framework for BLE Neighb…
☆15Feb 7, 2023Updated 3 years ago
zxytim / arithmetic-encoding-compression
View on GitHub
☆11Apr 3, 2023Updated 2 years ago
josehu07 / summerset
View on GitHub
Distributed, Replicated, Protocol-generic Key-value Store in Async Rust For SMR Protocols Research
☆17Updated this week
simveit / persistent_dense_gemm
View on GitHub
Persistent dense gemm for Hopper in `CuTeDSL`
☆15Aug 9, 2025Updated 6 months ago
P4xos / P4xos
View on GitHub
☆11Sep 22, 2017Updated 8 years ago