ihavnoid / tg4perfettoLinks

Simple python library for generating your own perfetto traces for your application. Can be used for both app instrumentation and custom trace generation (for your own purposes)

☆17

Alternatives and similar repositories for tg4perfetto

Users that are interested in tg4perfetto are comparing it to the libraries listed below

Sorting:

TiledTensor / TiledBench
Benchmark tests supporting the TiledCUDA library.
☆17Updated 9 months ago
microsoft / FractalTensor
FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …
☆27Updated 8 months ago
microsoft / cusync
☆27Updated last year
microsoft / TileFusion
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆95Updated 2 months ago
feifeibear / ChituAttention
Quantized Attention on GPU
☆44Updated 9 months ago
Lin-Mao / DrGPUM
A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.
☆25Updated 10 months ago
IBM / triton-dejavu
Framework to reduce autotune overhead to zero for well known deployments.
☆81Updated this week
ademeure / cuda-side-boost
☆42Updated 4 months ago
LeiWang1999 / Stream-k.tvm
☆19Updated 11 months ago
TiledTensor / TiledKernel
TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.
☆19Updated last year
ACA-Lab-SJTU / token-ring
☆11Updated 7 months ago
thuml / learn_torch.compile
torch.compile artifacts for common deep learning models, can be used as a learning resource for torch.compile
☆17Updated last year
microsoft / AttentionEngine
☆92Updated 3 months ago
LeiWang1999 / AutoGPTQ.tvm
GPTQ inference TVM kernel
☆40Updated last year
Qualcomm-AI-research / gptvq
☆33Updated last year
xdit-project / DiTCacheAnalysis
An auxiliary project analysis of the characteristics of KV in DiT Attention.
☆32Updated 9 months ago
tile-ai / TileOPs
☆47Updated last week
li-plus / flash-preference
Accelerate LLM preference tuning via prefix sharing with a single line of code
☆43Updated 2 months ago
FindHao / drgpu
A Top-Down Profiler for GPU Applications
☆20Updated last year
north-numerical-computing / tensor-cores-numerical-behavior
Test suite for probing the numerical behavior of NVIDIA tensor cores
☆41Updated last year
Ascend / triton-ascend
Triton adapter for Ascend. Mirror of https://gitee.com/ascend/triton-ascend
☆68Updated this week
metacarbon / shareAtt
Beyond KV Caching: Shared Attention for Efficient LLMs
☆19Updated last year
tile-ai / AttentionEngine
☆50Updated 3 months ago
eth-easl / deltazip
Compression for Foundation Models
☆35Updated last month
flashinfer-ai / cutlass-viz
☆63Updated 4 months ago
dame-cell / Triformer
Transformers components but in Triton
☆34Updated 3 months ago
PingchengDong / GQA-LUT
The official implementation of the DAC 2024 paper GQA-LUT
☆20Updated 8 months ago
GindaChen / FlexFlashAttention3
FlexAttention w/ FlashAttention3 Support
☆27Updated 11 months ago
cassiewilliam / cuda_op_benchmark
方便扩展的Cuda算子理解和优化框架，仅用在学习使用
☆16Updated last year
tile-ai / tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
☆19Updated last week