ihavnoid / tg4perfettoLinks
Simple python library for generating your own perfetto traces for your application. Can be used for both app instrumentation and custom trace generation (for your own purposes)
☆17Updated 2 months ago
Alternatives and similar repositories for tg4perfetto
Users that are interested in tg4perfetto are comparing it to the libraries listed below
Sorting:
- Benchmark tests supporting the TiledCUDA library.☆17Updated 9 months ago
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆27Updated 8 months ago
- ☆27Updated last year
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆95Updated 2 months ago
- Quantized Attention on GPU☆44Updated 9 months ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated 10 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆81Updated this week
- ☆42Updated 4 months ago
- ☆19Updated 11 months ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19Updated last year
- ☆11Updated 7 months ago
- torch.compile artifacts for common deep learning models, can be used as a learning resource for torch.compile☆17Updated last year
- ☆92Updated 3 months ago
- GPTQ inference TVM kernel☆40Updated last year
- ☆33Updated last year
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆32Updated 9 months ago
- ☆47Updated last week
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆43Updated 2 months ago
- A Top-Down Profiler for GPU Applications☆20Updated last year
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆41Updated last year
- Triton adapter for Ascend. Mirror of https://gitee.com/ascend/triton-ascend☆68Updated this week
- Beyond KV Caching: Shared Attention for Efficient LLMs☆19Updated last year
- ☆50Updated 3 months ago
- Compression for Foundation Models☆35Updated last month
- ☆63Updated 4 months ago
- Transformers components but in Triton☆34Updated 3 months ago
- The official implementation of the DAC 2024 paper GQA-LUT☆20Updated 8 months ago
- FlexAttention w/ FlashAttention3 Support☆27Updated 11 months ago
- 方便扩展的Cuda算子理解和优化框架,仅用在学习使用☆16Updated last year
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Updated last week