A dynamic binary instrumentation tool for tracing and analyzing CUDA kernel instructions.
☆35Mar 18, 2026Updated last week
Alternatives and similar repositories for CUTracer
Users that are interested in CUTracer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Luthier, a GPU binary instrumentation tool for AMD GPUs☆27Mar 13, 2026Updated last week
- My attempt to improve the speed of the newton schulz algorithm, starting from the dion implementation.☆33Dec 5, 2025Updated 3 months ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆196Mar 18, 2026Updated last week
- A Top-Down Profiler for GPU Applications☆22Feb 29, 2024Updated 2 years ago
- LLVM/MLIR based compiler instrumentation of AMD GPU kernels☆20Jul 13, 2025Updated 8 months ago
- Source code for the CPU-Free model - a fully autonomous execution model for multi-GPU applications that completely excludes the involveme…☆22Apr 25, 2024Updated last year
- NVidia sass disassembler/inline patcher☆44Mar 14, 2026Updated last week
- MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…☆87Updated this week
- Accelerating SDF gradient computation in NeuS-like multi-view reconstruction with directional finite difference (DFD) and patch-based sam…☆34Mar 24, 2024Updated 2 years ago
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 7 months ago
- [RA-L'24, IROS'24] Official PyTorch Implementation of "Uni-DVPS: Unified Model for Depth-Aware Video Panoptic Segmentation"☆13Oct 11, 2024Updated last year
- GoLU, a novel, self-gated and element-wise activation function that performs well over a diverse set of tasks☆24Oct 4, 2025Updated 5 months ago
- ☆10May 12, 2022Updated 3 years ago
- Fast OS-level support for GPU checkpoint and restore☆279Sep 28, 2025Updated 5 months ago
- ☆14Oct 6, 2020Updated 5 years ago
- Measures the conformance of a BPF runtime to the ISA.☆37Updated this week
- diffusers with search engine☆12Jan 13, 2026Updated 2 months ago
- ☆14Mar 8, 2025Updated last year
- Framework for Algorithmic Correctness Testing of Operators☆16Mar 9, 2026Updated 2 weeks ago
- A WebAssembly eBPF runtime based on wasmtime in rust☆11Feb 20, 2023Updated 3 years ago
- Orchestration and memory for multi-agent systems☆14Feb 6, 2026Updated last month
- Artifact for 'Register Optimizations for Stencils on GPUs'☆10Sep 18, 2018Updated 7 years ago
- LoRAFusion: Efficient LoRA Fine-Tuning for LLMs☆25Sep 23, 2025Updated 6 months ago
- naïve blockchain in Rust☆10Nov 13, 2020Updated 5 years ago
- ☆12Mar 7, 2024Updated 2 years ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆128Jul 13, 2024Updated last year
- Tornado Web Server git repository for OpenShift with Python 3.3☆15Dec 13, 2015Updated 10 years ago
- Userspace eBPF Runtime Benchmarking Test Suite and Results☆16Updated this week
- This is a game interface called the doudizhu by Qt,and I only imitated the interface simply.The object has thr function of random license…☆12Sep 6, 2018Updated 7 years ago
- Notes on optimizing the linux kernel function csum_partial☆14Nov 28, 2021Updated 4 years ago
- A scheduling framework for multitasking over diverse XPUs, including GPUs, NPUs, ASICs, and FPGAs☆162Jan 13, 2026Updated 2 months ago
- Causal Analysis of Agent Behavior for AI Safety☆20Jun 27, 2023Updated 2 years ago
- ☆18Nov 11, 2025Updated 4 months ago
- This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".☆118Sep 24, 2025Updated 6 months ago
- Pytorch routines for (Ker)nel (Mac)hines☆11Oct 10, 2025Updated 5 months ago
- ☆12Feb 24, 2023Updated 3 years ago
- ☆12Jan 19, 2020Updated 6 years ago
- ☆13Mar 15, 2026Updated last week
- ☆79Mar 11, 2026Updated 2 weeks ago