PerFlow-AI is a programmable performance analysis, modeling, prediction tool for AI system.
☆29Feb 3, 2026Updated last month
Alternatives and similar repositories for PerFlow-AI
Users that are interested in PerFlow-AI are comparing it to the libraries listed below
Sorting:
- Domain-specific framework for performance analysis of parallel programs☆16Feb 11, 2026Updated 3 weeks ago
- Everything about PACMAN!☆14Dec 18, 2025Updated 2 months ago
- A GPU FP32 computation method with Tensor Cores.☆26Dec 8, 2025Updated 2 months ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆25May 12, 2025Updated 9 months ago
- A Triton-only attention backend for vLLM☆24Feb 11, 2026Updated 3 weeks ago
- Light-weight Performance Variance Detection for Production-run Parallel Applications☆16Aug 28, 2023Updated 2 years ago
- ☆41Jun 5, 2024Updated last year
- Website for Stanford SysML Seminar☆17Oct 27, 2025Updated 4 months ago
- ☆44Updated this week
- A language and compiler for irregular tensor programs.☆152Nov 29, 2024Updated last year
- Python tools for meshing rivers☆12Oct 2, 2025Updated 5 months ago
- ☆28Dec 3, 2025Updated 3 months ago
- Building the Virtuous Cycle for AI-driven LLM Systems☆192Updated this week
- 详细双语注释版word2vec源码,well-annotated word2vec☆10Oct 3, 2021Updated 4 years ago
- RisGraph: A Real-Time Streaming System for Evolving Graphs to Support Sub-millisecond Per-update Analysis at Millions Ops/s☆37May 11, 2022Updated 3 years ago
- Prefix-Aware Attention for LLM Decoding☆29Jan 23, 2026Updated last month
- A TUI-based utility for real-time monitoring of InfiniBand traffic and performance metrics on the local node☆63Dec 19, 2025Updated 2 months ago
- Boost hardware utilization for ML training workloads via Inter-model Horizontal Fusion☆32May 15, 2024Updated last year
- PetPS: Supporting Huge Embedding Models with Tiered Memory☆33May 21, 2024Updated last year
- ☆74Sep 15, 2025Updated 5 months ago
- netbeacon - monitoring your network capture, NIDS or network analysis process☆19Oct 26, 2013Updated 12 years ago
- WaferLLM: Large Language Model Inference at Wafer Scale☆90Jan 7, 2026Updated last month
- Code accompanying the NeurIPS 2019 paper AutoAssist: A Framework to Accelerate Training of Deep Neural Networks.☆14Oct 3, 2022Updated 3 years ago
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Apr 28, 2023Updated 2 years ago
- ☆10Sep 4, 2021Updated 4 years ago
- Optimize with SigOpt with this standalone SigOpt client driver.☆12Feb 23, 2026Updated last week
- ☆42Oct 11, 2021Updated 4 years ago
- ☆15Jul 18, 2023Updated 2 years ago
- ☆10Jun 4, 2021Updated 4 years ago
- Chaitin-Briggs register-allocation algorithm (LLVM back-end)☆12Jan 6, 2016Updated 10 years ago
- ☆11Oct 21, 2023Updated 2 years ago
- ☆11Mar 13, 2023Updated 2 years ago
- Automated bottleneck detection and solution orchestration☆19Feb 24, 2026Updated last week
- Jieba 0.39 的 Java 复刻版,支持原版 Jieba 的所有核心功能☆12Feb 14, 2019Updated 7 years ago
- Implementation of the BLE neighbor discovery simulation framework in paper "Blender: Toward Practical Simulation Framework for BLE Neighb…☆15Feb 7, 2023Updated 3 years ago
- ☆11Apr 3, 2023Updated 2 years ago
- Distributed, Replicated, Protocol-generic Key-value Store in Async Rust For SMR Protocols Research☆17Updated this week
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 6 months ago
- ☆11Sep 22, 2017Updated 8 years ago