A dynamic binary instrumentation tool for tracing and analyzing CUDA kernel instructions.
☆65Apr 30, 2026Updated this week
Alternatives and similar repositories for CUTracer
Users that are interested in CUTracer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Luthier, a GPU binary instrumentation tool for AMD GPUs☆28Updated this week
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆203Updated this week
- My attempt to improve the speed of the newton schulz algorithm, starting from the dion implementation.☆36Updated this week
- A Top-Down Profiler for GPU Applications☆22Feb 29, 2024Updated 2 years ago
- [WACV 2026] SceneEdited: A City-Scale Benchmark for 3D HD Map Updating via Image-Guided Change Detection☆16Apr 21, 2026Updated last week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- This is the repository for codes in paper "ShaderPerFormer: Platform-independent Context-aware Shader Performance Predictor"☆12May 16, 2024Updated last year
- MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…☆102Updated this week
- A simple calculation for LLM MFU.☆77Sep 10, 2025Updated 7 months ago
- [RA-L] SHeRLoc: Synchronized Heterogeneous Radar Place Recognition for Cross-Modal Localization☆28Nov 24, 2025Updated 5 months ago
- Accelerating SDF gradient computation in NeuS-like multi-view reconstruction with directional finite difference (DFD) and patch-based sam…☆34Mar 24, 2024Updated 2 years ago
- An experimental communicating attention kernel based on DeepEP.☆34Jul 29, 2025Updated 9 months ago
- Benchmark scripts for comparing tutorials in PyTorch and JAX☆14Aug 25, 2022Updated 3 years ago
- Isolated Kalman Filtering C++ library☆19Dec 29, 2025Updated 4 months ago
- Script debugger for Grand Theft Auto V.☆23Apr 26, 2026Updated last week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Official Implementation of SEA: Sparse Linear Attention with Estimated Attention Mask (ICLR 2024)☆12Jun 20, 2025Updated 10 months ago
- A set of useful algebraic preconditioners for iterative numerical linear-algebraic methods.☆18Jul 23, 2022Updated 3 years ago
- [RA-L'24, IROS'24] Official PyTorch Implementation of "Uni-DVPS: Unified Model for Depth-Aware Video Panoptic Segmentation"☆13Oct 11, 2024Updated last year
- ☆10May 12, 2022Updated 3 years ago
- Blindspots in LLMs I've noticed while AI coding. Sonnet family emphasis.☆13Mar 20, 2025Updated last year
- Measures the conformance of a BPF runtime to the ISA.☆37Apr 25, 2026Updated last week
- ☆14Oct 6, 2020Updated 5 years ago
- Fast OS-level support for GPU checkpoint and restore☆282Sep 28, 2025Updated 7 months ago
- ☆14Mar 8, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Framework for Algorithmic Correctness Testing of Operators☆17Mar 9, 2026Updated last month
- A WebAssembly eBPF runtime based on wasmtime in rust☆11Feb 20, 2023Updated 3 years ago
- Orchestration and memory for multi-agent systems☆14Feb 6, 2026Updated 2 months ago
- A High performance and tiny TVM graph executor library written in C which can compile to WebAssembly and use CUDA/WebGPU as the accelerat…☆12Aug 3, 2023Updated 2 years ago
- Artifact for 'Register Optimizations for Stencils on GPUs'☆10Sep 18, 2018Updated 7 years ago
- naïve blockchain in Rust☆10Nov 13, 2020Updated 5 years ago
- ☆12Mar 7, 2024Updated 2 years ago
- Physics laboratory assignments☆10Oct 5, 2024Updated last year
- Tornado Web Server git repository for OpenShift with Python 3.3☆15Dec 13, 2015Updated 10 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Userspace eBPF Runtime Benchmarking Test Suite and Results☆16Updated this week
- This is a game interface called the doudizhu by Qt,and I only imitated the interface simply.The object has thr function of random license…☆12Sep 6, 2018Updated 7 years ago
- An Implementation of LoRa for EmComm (Emergency Communication) or (TacComm) Tactical Communication☆20Jul 23, 2025Updated 9 months ago
- LLVM passes and IR generators code examples☆15Feb 12, 2026Updated 2 months ago
- Notes on optimizing the linux kernel function csum_partial☆14Nov 28, 2021Updated 4 years ago
- Causal Analysis of Agent Behavior for AI Safety☆20Jun 27, 2023Updated 2 years ago
- ☆18Nov 11, 2025Updated 5 months ago