meta-pytorch/kraken

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/meta-pytorch/kraken)

meta-pytorch / kraken

Triton-based Symmetric Memory operators and examples

☆86

Alternatives and similar repositories for kraken

Users that are interested in kraken are comparing it to the libraries listed below

Sorting:

sgl-project / DeepGEMM
View on GitHub
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
☆21Feb 9, 2026Updated 3 weeks ago
cchan / tccl
View on GitHub
extensible collectives library in triton
☆96Mar 31, 2025Updated 11 months ago
smartnets / dataloader-benchmarks
View on GitHub
DL Dataloader Benchmarks
☆20Jan 27, 2025Updated last year
hao-ai-lab / DistCA
View on GitHub
Efficient Long-context Language Model Training by Core Attention Disaggregation
☆92Updated this week
meta-pytorch / MSLK
View on GitHub
MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…
☆55Updated this week
malfet / llm_experiments
View on GitHub
☆12Aug 26, 2025Updated 6 months ago
wu-kan / wuk_cupti_wrapper
View on GitHub
a simple API to use CUPTI
☆11Aug 19, 2025Updated 6 months ago
meta-pytorch / tritonbench
View on GitHub
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆327Updated this week
dropbox / gemlite
View on GitHub
Fast low-bit matmul kernels in Triton
☆436Feb 1, 2026Updated last month
HydraQYH / expert_specialization_moe
View on GitHub
Expert Specialization MoE Solution based on CUTLASS
☆27Jan 19, 2026Updated last month
ademeure / QuickRunCUDA
View on GitHub
☆16Feb 24, 2026Updated last week
bertmaher / tf32_gemm
View on GitHub
Example of binding a TF32 CUTLASS GEMM kernel to PyTorch
☆12Jun 7, 2024Updated last year
tile-ai / AttentionEngine
View on GitHub
☆52May 19, 2025Updated 9 months ago
thomaschlt / mla.c
View on GitHub
Implementation from scratch in C of the Multi-head latent attention used in the Deepseek-v3 technical paper.
☆18Jan 15, 2025Updated last year
meta-pytorch / tlparse
View on GitHub
TORCH_TRACE parser for PT2
☆78Feb 26, 2026Updated last week
Dao-AILab / quack
View on GitHub
A Quirky Assortment of CuTe Kernels
☆838Updated this week
bertmaher / llama2.so
View on GitHub
Inference Llama 2 with a model compiled to native code by TorchInductor
☆14Feb 8, 2024Updated 2 years ago
triton-lang / kernels
View on GitHub
☆105Nov 7, 2024Updated last year
pytorch / helion
View on GitHub
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆774Updated this week
ademeure / cuda-side-boost
View on GitHub
☆53Feb 24, 2026Updated last week
ColfaxResearch / cutlass-kernels
View on GitHub
☆262Jul 11, 2024Updated last year
alexzhang13 / Triton-Puzzles-Solutions
View on GitHub
Personal solutions to the Triton Puzzles
☆20Jul 18, 2024Updated last year
TiledTensor / TiledLower
View on GitHub
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆14Nov 23, 2024Updated last year
manishucsd / py-codegen
View on GitHub
☆16Sep 24, 2024Updated last year
ByteDance-Seed / StragglerAnalysis
View on GitHub
☆51Apr 30, 2025Updated 10 months ago
nicolaswilde / amx-gemm-handwritten
View on GitHub
Handwritten GEMM using Intel AMX (Advanced Matrix Extension)
☆17Jan 11, 2025Updated last year
cloneofsimo / ptx-tutorial-by-aislop
View on GitHub
PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)
☆66Mar 24, 2025Updated 11 months ago
meta-pytorch / BackendBench
View on GitHub
Ship correct and fast LLM kernels to PyTorch
☆144Jan 14, 2026Updated last month
yifuwang / symm-mem-recipes
View on GitHub
☆160Dec 27, 2024Updated last year
xdit-project / DistVAE
View on GitHub
A parallelism VAE avoids OOM for high resolution image generation
☆85Aug 4, 2025Updated 7 months ago
Bruce-Lee-LY / cutlass_gemm
View on GitHub
Multiple GEMM operators are constructed with cutlass to support LLM inference.
☆21Aug 3, 2025Updated 7 months ago
InternLM / Awesome-LLM-Training-System
View on GitHub
☆48Aug 6, 2024Updated last year
BBuf / megatron-lm-parallel-group-playground
View on GitHub
☆16Mar 30, 2024Updated last year
zhuzilin / flash-attention-with-sink
View on GitHub
☆38Aug 7, 2025Updated 6 months ago
sgugger / torchdynamo-tests
View on GitHub
☆20Nov 23, 2022Updated 3 years ago
ByteDance-Seed / ByteCheckpoint
View on GitHub
ByteCheckpoint: An Unified Checkpointing Library for LFMs
☆270Feb 2, 2026Updated last month
TiledTensor / TiledCUDA
View on GitHub
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆194Jan 28, 2025Updated last year
Infini-AI-Lab / MagicDec
View on GitHub
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆143Dec 4, 2024Updated last year
microsoft / sarathi-serve
View on GitHub
A low-latency & high-throughput serving engine for LLMs
☆482Jan 8, 2026Updated last month