A dynamic binary instrumentation tool for tracing and analyzing CUDA kernel instructions.
☆66May 19, 2026Updated last week
Alternatives and similar repositories for CUTracer
Users that are interested in CUTracer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Luthier, a GPU binary instrumentation tool for AMD GPUs☆28May 21, 2026Updated last week
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆207Updated this week
- My attempt to improve the speed of the newton schulz algorithm, starting from the dion implementation.☆38Apr 30, 2026Updated 3 weeks ago
- A Top-Down Profiler for GPU Applications☆22Feb 29, 2024Updated 2 years ago
- LLVM/MLIR based compiler instrumentation of AMD GPU kernels☆21Jul 13, 2025Updated 10 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Source code for the CPU-Free model - a fully autonomous execution model for multi-GPU applications that completely excludes the involveme…☆21Apr 25, 2024Updated 2 years ago
- This is the repository for codes in paper "ShaderPerFormer: Platform-independent Context-aware Shader Performance Predictor"☆12May 16, 2024Updated 2 years ago
- MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…☆107May 22, 2026Updated last week
- MOLA implementations of the virtual state estimation API for robots / vehicles☆11May 19, 2026Updated last week
- A simple calculation for LLM MFU.☆77Sep 10, 2025Updated 8 months ago
- NVidia sass disassembler/inline patcher☆76May 21, 2026Updated last week
- ☆57Feb 24, 2026Updated 3 months ago
- Accelerating SDF gradient computation in NeuS-like multi-view reconstruction with directional finite difference (DFD) and patch-based sam…☆34Mar 24, 2024Updated 2 years ago
- An experimental communicating attention kernel based on DeepEP.☆34Jul 29, 2025Updated 10 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A Maven dependency graph generator for Bazel☆17May 18, 2026Updated last week
- Official Implementation of SEA: Sparse Linear Attention with Estimated Attention Mask (ICLR 2024)☆12Jun 20, 2025Updated 11 months ago
- ☆10May 12, 2022Updated 4 years ago
- Blindspots in LLMs I've noticed while AI coding. Sonnet family emphasis.☆13Mar 20, 2025Updated last year
- diffusers with search engine☆12Jan 13, 2026Updated 4 months ago
- Fast OS-level support for GPU checkpoint and restore☆281Sep 28, 2025Updated 8 months ago
- ☆14Mar 8, 2025Updated last year
- Framework for Algorithmic Correctness Testing of Operators☆17Mar 9, 2026Updated 2 months ago
- Measures the conformance of a BPF runtime to the ISA.☆38Updated this week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A WebAssembly eBPF runtime based on wasmtime in rust☆11Feb 20, 2023Updated 3 years ago
- Orchestration and memory for multi-agent systems☆15Feb 6, 2026Updated 3 months ago
- naïve blockchain in Rust☆10Nov 13, 2020Updated 5 years ago
- Simulate and Render MuJoCo in the Browser with 3DGS.☆48Apr 16, 2026Updated last month
- Physics laboratory assignments☆10Oct 5, 2024Updated last year
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆129Jul 13, 2024Updated last year
- Tornado Web Server git repository for OpenShift with Python 3.3☆15Dec 13, 2015Updated 10 years ago
- This is a game interface called the doudizhu by Qt,and I only imitated the interface simply.The object has thr function of random license…☆12Sep 6, 2018Updated 7 years ago
- Userspace eBPF Runtime Benchmarking Test Suite and Results☆16May 21, 2026Updated last week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- LLVM passes and IR generators code examples☆15Feb 12, 2026Updated 3 months ago
- ☆18Nov 11, 2025Updated 6 months ago
- A scheduling framework for multitasking over diverse XPUs, including GPUs, NPUs, ASICs, and FPGAs☆172May 9, 2026Updated 2 weeks ago
- Multi-Spectral Gaussian Splatting with Neural Color Representation☆31May 6, 2026Updated 3 weeks ago
- Repository of the paper 'CodeQueries: A Dataset of Semantic Queries over Code' published in ISEC 2024☆13Apr 21, 2024Updated 2 years ago
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆103Apr 7, 2026Updated last month
- ☆16Sep 26, 2024Updated last year