wu-kan/wuk_cupti_wrapper

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wu-kan/wuk_cupti_wrapper)

wu-kan / wuk_cupti_wrapper

a simple API to use CUPTI

☆10

Alternatives and similar repositories for wuk_cupti_wrapper

Users that are interested in wuk_cupti_wrapper are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
SJTU-IPADS / MetaAttention
View on GitHub
MetaAttention: A Unified and Performant Attention Framework Across Hardware Backends(PPoPP'26)
☆16Dec 31, 2025Updated 6 months ago
flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated last year
chenyu-jiang / dcp
View on GitHub
Code repository for the SOSP'25 paper DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism.
☆21Nov 28, 2025Updated 7 months ago
rchardx / hopper-gemm
View on GitHub
☆48Nov 1, 2025Updated 8 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
wu-kan / GoPTX
View on GitHub
GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving
☆21Jul 30, 2025Updated 11 months ago
liangyuwang / Tiny-Megatron
View on GitHub
Tiny-Megatron, a minimalistic re-implementation of the Megatron library
☆32Sep 1, 2025Updated 10 months ago
tile-ai / tvm
View on GitHub
Open deep learning compiler stack for cpu, gpu and specialized accelerators
☆20Updated this week
TiledTensor / TiledCUDA
View on GitHub
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆192Jan 28, 2025Updated last year
sail-sg / odc
View on GitHub
On demand communication
☆34Apr 16, 2026Updated 3 months ago
chips-compilers-mlsys-21 / chips-compilers-mlsys-21.github.io
View on GitHub
☆11Apr 5, 2021Updated 5 years ago
Bruce-Lee-LY / cutlass_gemm
View on GitHub
Multiple GEMM operators are constructed with cutlass to support LLM inference.
☆20Aug 3, 2025Updated 11 months ago
tile-ai / AttentionEngine
View on GitHub
☆52May 19, 2025Updated last year
HuangShiqing / memory_viz_plus
View on GitHub
☆18Jun 14, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Dao-AILab / AI-workflow
View on GitHub
☆71Mar 24, 2026Updated 4 months ago
YangLinzhuo / cuda-sgemm-optimization
View on GitHub
CUDA SGEMM optimization note
☆15Oct 31, 2023Updated 2 years ago
Mogball / triton_lite
View on GitHub
☆20May 24, 2025Updated last year
Luniam / SimpleKVStore
View on GitHub
A distributed key value database based on LSM Tree storage
☆15Aug 24, 2022Updated 3 years ago
oliverYoung2001 / UltraAttn
View on GitHub
SC'25 UltraAttn: Efficiently Parallelizing Attention through Hierarchical Context-Tiling
☆16Aug 14, 2025Updated 11 months ago
GeeeekExplorer / kkbot
View on GitHub
A Feishu/Lark AI agent bot
☆15Feb 27, 2026Updated 4 months ago
lemyx / tilelang-dsa
View on GitHub
DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang
☆47Nov 19, 2025Updated 8 months ago
cyhdmjzzy / DeepEP-Code-Analysis
View on GitHub
☆26Feb 27, 2026Updated 4 months ago
tile-ai / tilelang-benchmark
View on GitHub
☆22Jun 10, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
MLSysU / EcoServe
View on GitHub
[OSDI' 26] Efficient LLM Serving on Commodity GPU Clusters with Data-Reduced Cross-Instance Orchestration
☆23Jul 5, 2026Updated 3 weeks ago
infinigence / HamiltonAttention
View on GitHub
☆45Oct 15, 2025Updated 9 months ago
MoZeWei / moTuner
View on GitHub
☆10May 12, 2022Updated 4 years ago
aikitoria / nanotrace
View on GitHub
Low overhead tracing library and trace visualizer for pipelined CUDA kernels
☆137Jul 17, 2026Updated last week
nex-agi / NexVenusCL
View on GitHub
Nex Venus Communication Library
☆75Nov 17, 2025Updated 8 months ago
thomaschlt / mla.c
View on GitHub
Implementation from scratch in C of the Multi-head latent attention used in the Deepseek-v3 technical paper.
☆18Jan 15, 2025Updated last year
foundry-org / foundry
View on GitHub
Foundry materializes CUDA graphs along with its execution context to disk to support fast cold start of serving engines.
☆46Jul 8, 2026Updated 2 weeks ago
zejia-lin / BulletServe
View on GitHub
Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration
☆53Jan 8, 2026Updated 6 months ago
bytedance-iaas / sglang
View on GitHub
SGLang is a fast serving framework for large language models and vision language models.
☆30Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
eunomia-bpf / cupti-tutorial
View on GitHub
Tutorials for NVIDIA CUPTI samples
☆70Updated this week
Oneflow-Inc / serving
View on GitHub
OneFlow Serving
☆20Apr 10, 2025Updated last year
Terra-Flux / PolyRL
View on GitHub
[NSDI'26] PolyRL is a reinforcement learning framework for LLM that harvest spot instances on the cloud to reduce cost.
☆19Mar 30, 2026Updated 3 months ago
microsoft / llm-42
View on GitHub
[Accepted to SOSP 2026] Fast Deterministic LLM Inference
☆22Updated this week
microsoft / tokenweave
View on GitHub
Accepted to MLSys 2026
☆91Apr 19, 2026Updated 3 months ago
zeroine / cutlass-cute-sample
View on GitHub
☆49Apr 15, 2024Updated 2 years ago
Harry-Chen / fp4_sm120
View on GitHub
Make FP4 on 5090 Great Again
☆17Updated this week