eunomia-bpf / cupti-tutorialLinks
Tutorials for NVIDIA CUPTI samples
☆24Updated last week
Alternatives and similar repositories for cupti-tutorial
Users that are interested in cupti-tutorial are comparing it to the libraries listed below
Sorting:
- An experimental communicating attention kernel based on DeepEP.☆34Updated last month
- ☆42Updated 4 months ago
- ☆55Updated 3 months ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated 10 months ago
- DeeperGEMM: crazy optimized version☆70Updated 4 months ago
- ☆27Updated 6 months ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆37Updated this week
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆95Updated 2 months ago
- A lightweight design for computation-communication overlap.☆161Updated this week
- ☆63Updated 4 months ago
- Microsoft Collective Communication Library☆67Updated 9 months ago
- ☆84Updated 5 months ago
- ☆23Updated last week
- gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling☆39Updated 2 weeks ago
- Tile-based language built for AI computation across all scales☆48Updated this week
- DeepSeek-V3/R1 inference performance simulator☆165Updated 5 months ago
- Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport☆61Updated 3 months ago
- Thunder Research Group's Collective Communication Library☆40Updated last month
- Sample Codes using NVSHMEM on Multi-GPU☆24Updated 2 years ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆47Updated last week
- ☆27Updated last year
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆79Updated 9 months ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆22Updated 3 months ago
- Triton adapter for Ascend. Mirror of https://gitee.com/ascend/triton-ascend☆68Updated this week
- ☆47Updated 8 months ago
- GPU Performance Advisor☆66Updated 3 years ago
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆56Updated this week
- ☆39Updated last year
- ☆28Updated 2 months ago
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆27Updated 8 months ago