OpenMLIR / TritonLLMLinks

LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model

☆36

Alternatives and similar repositories for TritonLLM

Users that are interested in TritonLLM are comparing it to the libraries listed below

Sorting:

FlagTree / flagtree
FlagTree is a unified compiler for multiple AI chips, which is forked from triton-lang/triton.
☆83Updated last week
sunkx109 / GPUs-Specs
Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM
☆63Updated last month
infinigence / Semi-PD
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆108Updated 3 months ago
tile-ai / tilescale
Tile-based language built for AI computation across all scales
☆51Updated this week
gty111 / gLLM
gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling
☆39Updated this week
Ascend / triton-ascend
Triton adapter for Ascend. Mirror of https://gitee.com/ascend/triton-ascend
☆70Updated this week
KuangjuX / NVSHMEM-Tutorial
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆60Updated this week
tsinghua-ideal / Canvas
Canvas: End-to-End Kernel Architecture Search in Neural Networks
☆27Updated 9 months ago
LeiWang1999 / tvm_gpu_gemm
play gemm with tvm
☆91Updated 2 years ago
shenh10 / DeepSeek_Simulator
☆84Updated 5 months ago
TiledTensor / TiledCUDA
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆183Updated 7 months ago
feifeibear / LLMRoofline
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
☆112Updated last year
TiledTensor / TiledLower
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆14Updated 9 months ago
InfiniTensor / InfiniTensor
☆253Updated this week
gfvvz / triton-learning-materials
Triton Compiler related materials.
☆31Updated 8 months ago
Oneflow-Inc / dfccl
☆27Updated 6 months ago
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆165Updated this week
xlite-dev / HGEMM
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
☆109Updated 4 months ago
sjtu-epcc / Tacker
Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS
☆31Updated 7 months ago
heheda12345 / MagPy
☆39Updated last year
lipracer / cuda-rt-hook
☆42Updated last month
antgroup / DeepXTrace
DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.
☆47Updated last week
CalebDu / Awesome-Cute
☆101Updated 3 months ago
Yongqi-Zhuo / triton-tvm
Triton to TVM transpiler.
☆22Updated 10 months ago
KuangjuX / Paper-reading
My Paper Reading Lists and Notes.
☆20Updated 8 months ago
rchardx / cuda-gemm
☆28Updated 5 months ago
nox-410 / tvm.tl
An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.
☆51Updated last year
LeiWang1999 / Stream-k.tvm
☆19Updated 11 months ago
pku-liang / MAGIS
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
☆54Updated last year
SJTU-IPADS / disb
DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.
☆53Updated last year