Chtholly-Boss/swizzle

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Chtholly-Boss/swizzle)

Chtholly-Boss / swizzle

A practical way of learning Swizzle

☆45

Alternatives and similar repositories for swizzle

Users that are interested in swizzle are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

JJXiangJiaoJun / cutlass_gemv
View on GitHub
GEMV implementation with CUTLASS
☆21Aug 21, 2025Updated 11 months ago
vllm-project / tml-fa4
View on GitHub
FA4-based Relative Attention Kernel developed by TML and Colfax
☆17Jul 17, 2026Updated last week
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
CalebDu / Awesome-Cute
View on GitHub
☆121May 16, 2025Updated last year
leimao / CUTLASS-Examples
View on GitHub
CUTLASS and CuTe Examples
☆136Nov 30, 2025Updated 7 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
zeroine / cutlass-cute-sample
View on GitHub
☆49Apr 15, 2024Updated 2 years ago
HanGuo97 / hilt
View on GitHub
☆40Dec 14, 2025Updated 7 months ago
HydraQYH / hp_rms_norm
View on GitHub
High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)
☆30Jan 22, 2026Updated 6 months ago
DD-DuDa / Cute-Learning
View on GitHub
Examples of CUDA implementations by Cutlass CuTe
☆279Jul 1, 2025Updated last year
weishengying / tiny-flash-attention
View on GitHub
使用 cutlass 实现 flash-attention 精简版，具有教学意义
☆59Aug 12, 2024Updated last year
nicolaswilde / amx-gemm-handwritten
View on GitHub
Handwritten GEMM using Intel AMX (Advanced Matrix Extension)
☆17Jan 11, 2025Updated last year
HydraQYH / expert_specialization_moe
View on GitHub
Expert Specialization MoE Solution based on CUTLASS
☆27Apr 14, 2026Updated 3 months ago
caiwanxianhust / FasterLLaMA
View on GitHub
使用 CUDA C++ 实现的 llama 模型推理框架
☆64Nov 8, 2024Updated last year
HarryWu99 / funny_cute
View on GitHub
Some funny cute/cuteDSL code snippets
☆33Mar 2, 2026Updated 4 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
Bruce-Lee-LY / cuda_auto_tune
View on GitHub
NCU-driven iterative optimization workflow for CUDA/CUTLASS/Triton/CuTe DSL kernels.
☆23Apr 10, 2026Updated 3 months ago
cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
sablin39 / tilelang-cuda-skills
View on GitHub
Skills for writing tilelang and debugging with CUDA toolkits.
☆133May 20, 2026Updated 2 months ago
YJMSTR / flash-linear-attention
View on GitHub
FLA but cuTile
☆27Apr 17, 2026Updated 3 months ago
deciding / cutez
View on GitHub
CuTeDSL tutorials, tools, autotuner, profiler, etc.
☆41Jun 27, 2026Updated 3 weeks ago
NTT123 / cute-viz
View on GitHub
Cute layout visualization
☆44Jan 18, 2026Updated 6 months ago
NVIDIA / CompileIQ
View on GitHub
An Optimizer for Nvidia Compilers.
☆110Jul 3, 2026Updated 3 weeks ago
Bruce-Lee-LY / decoding_attention
View on GitHub
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
☆47Jun 11, 2025Updated last year
ademeure / DeeperGEMM
View on GitHub
DeeperGEMM: crazy optimized version
☆86May 5, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
reed-lau / cute-gemm
View on GitHub
☆188May 11, 2026Updated 2 months ago
antgroup / DeepXTrace
View on GitHub
DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.
☆101Jan 16, 2026Updated 6 months ago
leepoly / sm-profiler
View on GitHub
☆83Feb 5, 2026Updated 5 months ago
rchardx / hopper-gemm
View on GitHub
☆48Nov 1, 2025Updated 8 months ago
flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated last year
weishengying / cute_gemm
View on GitHub
☆23Aug 14, 2024Updated last year
weishengying / cutlass_flash_atten_fp8
View on GitHub
使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention
☆82Aug 12, 2024Updated last year
Bruce-Lee-LY / cutlass_gemm
View on GitHub
Multiple GEMM operators are constructed with cutlass to support LLM inference.
☆20Aug 3, 2025Updated 11 months ago
caiwanxianhust / CUDA-BLOG
View on GitHub
存放一些 CUDA 编程相关的博客文件。
☆22Oct 16, 2025Updated 9 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
microsoft / TileFusion
View on GitHub
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆115Jun 28, 2025Updated last year
Bruce-Lee-LY / cuda_hgemm
View on GitHub
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…
☆557Sep 8, 2024Updated last year
flashinfer-ai / debug-print
View on GitHub
Debug print operator for cudagraph debugging
☆18Aug 2, 2024Updated last year
lemyx / tilelang-dsa
View on GitHub
DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang
☆47Nov 19, 2025Updated 8 months ago
tile-ai / TileFoundry
View on GitHub
☆54Updated this week
yester31 / Cutlass_EX
View on GitHub
study of cutlass
☆22Nov 10, 2024Updated last year
KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆34Jul 29, 2025Updated 11 months ago