AMD-AGI / GEAK-agentLinks

It is an LLM-based AI agent, which can write correct and efficient gpu kernels automatically.

☆58

Alternatives and similar repositories for GEAK-agent

Users that are interested in GEAK-agent are comparing it to the libraries listed below

Sorting:

meta-pytorch / KernelAgent
Autonomous GPU Kernel Generation via Deep Agents
☆223Updated this week
ademeure / DeeperGEMM
DeeperGEMM: crazy optimized version
☆73Updated 8 months ago
flashinfer-ai / cutlass-viz
☆65Updated 9 months ago
flashinfer-ai / flashinfer-bench
Building the Virtuous Cycle for AI-driven LLM Systems
☆140Updated this week
microsoft / TileFusion
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆105Updated 7 months ago
DeepLink-org / DLSlime
DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit
☆92Updated this week
microsoft / AttentionEngine
☆117Updated 8 months ago
infinigence / FUSCO
High-performance distributed data shuffling (all-to-all) library for MoE training and inference
☆109Updated last month
IBM / triton-dejavu
Framework to reduce autotune overhead to zero for well known deployments.
☆94Updated 4 months ago
triton-lang / kernels
☆102Updated last year
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆213Updated last week
ByteDance-Seed / cudaLLM
☆128Updated 5 months ago
INT-FlashAttention2024 / INT-FlashAttention
☆84Updated last year
tile-ai / tilescale
Tile-based language built for AI computation across all scales
☆117Updated 2 weeks ago
zhuzilin / flash-attention-with-sink
☆38Updated 5 months ago
cherichy / tilecute
☆32Updated 6 months ago
meta-pytorch / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆319Updated this week
dsl-learn / cutile-learn
NVIDIA cuTile learn
☆154Updated last month
antgroup / DeepXTrace
DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.
☆91Updated 2 weeks ago
osayamenja / FlashMoE
Distributed MoE in a Single Kernel [NeurIPS '25]
☆188Updated this week
Ascend / triton-ascend
Triton adapter for Ascend. Mirror of https://gitee.com/ascend/triton-ascend
☆105Updated this week
xlite-dev / HGEMM
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
☆148Updated 8 months ago
tile-ai / AttentionEngine
☆52Updated 8 months ago
OpenBitSys / BitDecoding
[HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.
☆79Updated last month
DeepLink-org / DLCompiler
triton for dsa
☆56Updated this week
PKU-SEC-Lab / HybriMoE
[DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"
☆97Updated last month
alibaba / easydist
Automated Parallelization System and Infrastructure for Multiple Ecosystems
☆82Updated last year
PipeFusion / PipeFusion
A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters
☆55Updated last year
meta-pytorch / tritonparse
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
☆189Updated this week
KuangjuX / AttnLink
An experimental communicating attention kernel based on DeepEP.
☆35Updated 6 months ago