ihavnoid / tg4perfettoLinks
Simple python library for generating your own perfetto traces for your application. Can be used for both app instrumentation and custom trace generation (for your own purposes)
☆20Updated 4 months ago
Alternatives and similar repositories for tg4perfetto
Users that are interested in tg4perfetto are comparing it to the libraries listed below
Sorting:
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆29Updated 10 months ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆100Updated 4 months ago
- ☆24Updated last year
- Framework to reduce autotune overhead to zero for well known deployments.☆85Updated last month
- ☆49Updated 6 months ago
- Benchmark tests supporting the TiledCUDA library.☆17Updated 11 months ago
- ☆19Updated last year
- ☆56Updated last week
- ☆31Updated 4 months ago
- Quantized Attention on GPU☆44Updated 11 months ago
- GPTQ inference TVM kernel☆39Updated last year
- ☆106Updated 5 months ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆26Updated last year
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆47Updated 2 months ago
- ☆13Updated 2 weeks ago
- torch.compile artifacts for common deep learning models, can be used as a learning resource for torch.compile☆18Updated last year
- ☆50Updated 5 months ago
- A practical way of learning Swizzle☆32Updated 9 months ago
- Efficient Compute-Communication Overlap for Distributed LLM Inference☆62Updated 2 weeks ago
- Triton adapter for Ascend. Mirror of https://gitee.com/ascend/triton-ascend☆82Updated this week
- ☆65Updated 6 months ago
- Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.☆76Updated last week
- A Top-Down Profiler for GPU Applications☆22Updated last year
- DeeperGEMM: crazy optimized version☆73Updated 6 months ago
- ☆12Updated 10 months ago
- It is an LLM-based AI agent, which can write correct and efficient gpu kernels automatically.☆38Updated this week
- Tutorials for NVIDIA CUPTI samples☆38Updated 2 weeks ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19Updated last year
- Debug print operator for cudagraph debugging☆14Updated last year
- Artifacts of EVT ASPLOS'24☆28Updated last year