flashinfer-ai/debug-print

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/flashinfer-ai/debug-print)

flashinfer-ai / debug-print

Debug print operator for cudagraph debugging

☆18

Alternatives and similar repositories for debug-print

Users that are interested in debug-print are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated last year
IBM / triton-dejavu
View on GitHub
Framework to reduce autotune overhead to zero for well known deployments.
☆101Sep 19, 2025Updated 10 months ago
radixark / miles_diffusion
View on GitHub
[Experimental] Miles-diffusion is an post-training framework for large-scale diffusion model training and production workloads, forked fr…
☆21Updated this week
uwsampl / paper-agents
View on GitHub
☆13Dec 9, 2024Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
flashinfer-ai / cubloaty
View on GitHub
a size profiler for cuda binary
☆71Jan 15, 2026Updated 6 months ago
yester31 / Cutlass_EX
View on GitHub
study of cutlass
☆22Nov 10, 2024Updated last year
Triang-jyed-driung / i8muon
View on GitHub
Muon in Int8 Precision Made Possible
☆20Jun 18, 2026Updated last month
lemyx / tilelang-dsa
View on GitHub
DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang
☆47Nov 19, 2025Updated 8 months ago
ademeure / DeeperGEMM
View on GitHub
DeeperGEMM: crazy optimized version
☆86May 5, 2025Updated last year
tile-ai / AttentionEngine
View on GitHub
☆52May 19, 2025Updated last year
mlc-ai / mlc-python
View on GitHub
☆36Jul 19, 2025Updated last year
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
YJMSTR / flash-linear-attention
View on GitHub
FLA but cuTile
☆27Apr 17, 2026Updated 3 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ROCm / hipTensor
View on GitHub
AMD’s C++ library for accelerating tensor primitives
☆49Updated this week
TiledTensor / TiledLower
View on GitHub
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆13Nov 23, 2024Updated last year
DBOS-project / voltdb
View on GitHub
☆18Feb 11, 2025Updated last year
megvii-research / IntLLaMA
View on GitHub
IntLLaMA: A fast and light quantization solution for LLaMA
☆19Jul 21, 2023Updated 3 years ago
fishmingyu / qrv2-gpu-mode
View on GitHub
Batched square compact-Householder QR factorization.
☆14Jul 2, 2026Updated 2 weeks ago
ByteDance-Seed / ByteCheckpoint
View on GitHub
ByteCheckpoint: An Unified Checkpointing Library for LFMs
☆286Feb 2, 2026Updated 5 months ago
KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆34Jul 29, 2025Updated 11 months ago
nicolaswilde / amx-gemm-handwritten
View on GitHub
Handwritten GEMM using Intel AMX (Advanced Matrix Extension)
☆17Jan 11, 2025Updated last year
tsinghua-ideal / Syno
View on GitHub
Source code repository for ASPLOS '25 paper "Syno: Structured Synthesis for Neural Operators"
☆15Aug 31, 2025Updated 10 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
GATECH-EIC / Robust-Scratch-Ticket
View on GitHub
[NeurIPS 2021] "Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks" by Yon…
☆13Feb 13, 2022Updated 4 years ago
KuangjuX / Paper-reading
View on GitHub
My Paper Reading Lists and Notes.
☆25May 8, 2026Updated 2 months ago
uwsampl / SparseTIR
View on GitHub
SparseTIR: Sparse Tensor Compiler for Deep Learning
☆145Mar 31, 2023Updated 3 years ago
thunlp / BlockFFN
View on GitHub
Source codes for paper "BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity".
☆19Jan 10, 2026Updated 6 months ago
0xSero / sglang-moet
View on GitHub
SGLang-native serving for the Moet sign-symmetric W2 expert format with SM120 W2/W4 kernels, GLM-5.2 NVFP4 TP4 on 4x RTX PRO 6000
☆19Jul 10, 2026Updated last week
catswe / flash-attention-residuals
View on GitHub
Triton kernels and PyTorch ops for Block Attention Residuals (AttnRes)
☆86May 29, 2026Updated last month
HanGuo97 / hilt
View on GitHub
☆40Dec 14, 2025Updated 7 months ago
tsinghua-ideal / Canvas
View on GitHub
Canvas: End-to-End Kernel Architecture Search in Neural Networks
☆27Nov 18, 2024Updated last year
NVIDIA / CompileIQ
View on GitHub
An Optimizer for Nvidia Compilers.
☆107Jul 3, 2026Updated 2 weeks ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
li-plus / flash-preference
View on GitHub
Accelerate LLM preference tuning via prefix sharing with a single line of code
☆52Jul 4, 2025Updated last year
microsoft / AttentionEngine
View on GitHub
☆123May 19, 2025Updated last year
VainF / Remix-DiT
View on GitHub
☆18Dec 11, 2024Updated last year
VivekPanyam / cudaparsers
View on GitHub
Parsers for CUDA binary files
☆25Dec 29, 2023Updated 2 years ago
tile-ai / TileFoundry
View on GitHub
☆54Updated this week
sablin39 / tilelang-cuda-skills
View on GitHub
Skills for writing tilelang and debugging with CUDA toolkits.
☆131May 20, 2026Updated 2 months ago
Chtholly-Boss / swizzle
View on GitHub
A practical way of learning Swizzle
☆42Feb 3, 2025Updated last year