CalvinXKY / mfu_calculationView external linksLinks
A simple calculation for LLM MFU.
โ66Sep 10, 2025Updated 5 months ago
Alternatives and similar repositories for mfu_calculation
Users that are interested in mfu_calculation are comparing it to the libraries listed below
Sorting:
- ๐ป Terminal-Agent with Human-in-the-Loop Learningโ34Jan 16, 2026Updated 3 weeks ago
- Course repository for the Spring 2023 COMP664 course "Deep Learning" at UNCโ14Apr 17, 2023Updated 2 years ago
- An auxiliary project analysis of the characteristics of KV in DiT Attention.โ32Nov 29, 2024Updated last year
- Framework to reduce autotune overhead to zero for well known deployments.โ96Sep 19, 2025Updated 4 months ago
- Measuring the Signal to Noise Ratio in Language Model Evaluationโ28Aug 19, 2025Updated 5 months ago
- โ118May 19, 2025Updated 8 months ago
- All-in-one benchmarking platform for evaluating LLM.โ15Nov 12, 2025Updated 3 months ago
- LLVM/MLIR based compiler instrumentation of AMD GPU kernelsโ20Jul 13, 2025Updated 7 months ago
- Vortex: A Flexible and Efficient Sparse Attention Frameworkโ46Jan 21, 2026Updated 3 weeks ago
- LLM4HWDesign Starting Toolkitโ19Oct 4, 2024Updated last year
- Triton kernels for Fluxโ22Jul 7, 2025Updated 7 months ago
- A Top-Down Profiler for GPU Applicationsโ22Feb 29, 2024Updated last year
- parser script to process pytorch autograd profiler result, convert json file to excel.โ14Oct 8, 2019Updated 6 years ago
- โ20Oct 10, 2025Updated 4 months ago
- โ131Nov 11, 2024Updated last year
- Distributed Compiler based on Triton for Parallel Systemsโ1,350Updated this week
- Efficient Long-context Language Model Training by Core Attention Disaggregationโ87Jan 29, 2026Updated 2 weeks ago
- โ27Mar 29, 2025Updated 10 months ago
- Here are my personal paper reading notes (including machine learning systems, AI infrastructure, and other interesting stuffs).โ160Jan 27, 2026Updated 2 weeks ago
- Quantized Attention on GPUโ44Nov 22, 2024Updated last year
- PerFlow-AI is a programmable performance analysis, modeling, prediction tool for AI system.โ28Feb 3, 2026Updated last week
- โ20Sep 28, 2024Updated last year
- LLMA = LLM + Arithmetic coder, which use LLM to do insane text data compression. LLMA=ๅคงๆจกๅ+็ฎๆฏ็ผ็ ๏ผๅฎ่ฝไฝฟ็จLLMๅฏนๆๆฌๆฐๆฎ่ฟ่กๆดๅ็ๅ็ผฉ๏ผ่พพๅฐๆ้ซ็ๅ็ผฉ็ใโ22Nov 24, 2024Updated last year
- Official Repo for SwS: A Weakness-driven Problem Synthesis Framework in RL for LLM Reasoningโ42Nov 11, 2025Updated 3 months ago
- โ151Updated this week
- Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.โ84Jan 27, 2026Updated 2 weeks ago
- A model serving framework for various research and production scenarios. Seamlessly built upon the PyTorch and HuggingFace ecosystem.โ23Oct 11, 2024Updated last year
- Experiments on Multi-Head Latent Attentionโ99Aug 19, 2024Updated last year
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clustersโ56Jul 23, 2024Updated last year
- Applied AI experiments and examples for PyTorchโ315Aug 22, 2025Updated 5 months ago
- DeepSeek-V3/R1 inference performance simulatorโ176Mar 27, 2025Updated 10 months ago
- โ23Oct 19, 2016Updated 9 years ago
- possibly useful materials for learning RWKV language model.โ26Jun 8, 2023Updated 2 years ago
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestrationโ260Nov 18, 2024Updated last year
- MSCCL++: A GPU-driven communication stack for scalable AI applicationsโ462Updated this week
- Bridge Megatron-Core to Hugging Face/Reinforcement Learningโ193Updated this week
- Tile-based language built for AI computation across all scalesโ120Updated this week
- Flash-Muon: An Efficient Implementation of Muon Optimizerโ233Jun 15, 2025Updated 7 months ago
- An experimental communicating attention kernel based on DeepEP.โ35Jul 29, 2025Updated 6 months ago