meta-pytorch/MSLK

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/meta-pytorch/MSLK)

meta-pytorch / MSLK

MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI training and inference, such as FP8 row-wise quantization and collective communications.

☆55

Alternatives and similar repositories for MSLK

Users that are interested in MSLK are comparing it to the libraries listed below

Sorting:

simveit / persistent_dense_gemm
View on GitHub
Persistent dense gemm for Hopper in `CuTeDSL`
☆15Aug 9, 2025Updated 6 months ago
Oneflow-Inc / one-fx
View on GitHub
A toolkit for developers to simplify the transformation of nn.Module instances. It's now corresponding to Pytorch.fx.
☆13Apr 7, 2023Updated 2 years ago
NVIDIA / hoti-2025-gpu-comms-tutorial
View on GitHub
Tutorial Exercises and Code for GPU Communications Tutorial at HOT Interconnects 2025
☆31Oct 22, 2025Updated 4 months ago
AIS-SNU / PathWeaver
View on GitHub
A High-Throughput Multi-GPU System for Graph-Based Approximate Nearest Neighbor Search
☆21Jul 22, 2025Updated 7 months ago
flagos-ai / libtriton_jit
View on GitHub
A Triton JIT runtime and ffi provider in C++
☆31Feb 24, 2026Updated last week
IST-DASLab / gemm-fp8
View on GitHub
High Performance FP8 GEMM Kernels for SM89 and later GPUs.
☆20Jan 24, 2025Updated last year
meta-pytorch / BackendBench
View on GitHub
Ship correct and fast LLM kernels to PyTorch
☆142Jan 14, 2026Updated last month
Ascend / torchair
View on GitHub
☆24Updated this week
lemyx / tilelang-dsa
View on GitHub
DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang
☆44Nov 19, 2025Updated 3 months ago
meta-pytorch / kraken
View on GitHub
Triton-based Symmetric Memory operators and examples
☆85Jan 15, 2026Updated last month
LeiWang1999 / AutoGPTQ.tvm
View on GitHub
GPTQ inference TVM kernel
☆40Apr 25, 2024Updated last year
facebookresearch / CUTracer
View on GitHub
A dynamic binary instrumentation tool for tracing and analyzing CUDA kernel instructions.
☆35Updated this week
megvii-research / IntLLaMA
View on GitHub
IntLLaMA: A fast and light quantization solution for LLaMA
☆18Jul 21, 2023Updated 2 years ago
IBM / triton-dejavu
View on GitHub
Framework to reduce autotune overhead to zero for well known deployments.
☆97Sep 19, 2025Updated 5 months ago
Chtholly-Boss / swizzle
View on GitHub
A practical way of learning Swizzle
☆37Feb 3, 2025Updated last year
meta-pytorch / tlparse
View on GitHub
TORCH_TRACE parser for PT2
☆78Updated this week
tlc-pack / cutlass_fpA_intB_gemm
View on GitHub
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
☆96Feb 20, 2026Updated last week
ROCm / FlyDSL
View on GitHub
☆111Updated this week
redplait / denvdis
View on GitHub
NVidia sass disassembler/inline patcher
☆43Updated this week
facebookresearch / any4
View on GitHub
Quantize transformers to any learned arbitrary 4-bit numeric format
☆51Jan 25, 2026Updated last month
flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated 10 months ago
perplexityai / pplx-kernels
View on GitHub
Perplexity GPU Kernels
☆567Nov 7, 2025Updated 3 months ago
ds2-lab / infinistore
View on GitHub
InfiniStore: an elastic serverless cloud storage system (VLDB'23)
☆24May 5, 2023Updated 2 years ago
apuaaChen / EVT_AE
View on GitHub
Artifacts of EVT ASPLOS'24
☆29Mar 6, 2024Updated last year
ademeure / DeeperGEMM
View on GitHub
DeeperGEMM: crazy optimized version
☆74May 5, 2025Updated 9 months ago
gfvvz / triton-learning-materials
View on GitHub
Triton Compiler related materials.
☆42Jan 4, 2025Updated last year
dame-cell / Triformer
View on GitHub
Transformers components but in Triton
☆34May 9, 2025Updated 9 months ago
ademeure / cuda-side-boost
View on GitHub
☆53Feb 24, 2026Updated last week
ArthurinRUC / cutlass-notes
View on GitHub
From Minimal GEMM to Everything
☆163Feb 10, 2026Updated 3 weeks ago
matinraayai / Luthier
View on GitHub
Luthier, a GPU binary instrumentation tool for AMD GPUs
☆27Feb 21, 2026Updated last week
cchan / tccl
View on GitHub
extensible collectives library in triton
☆95Mar 31, 2025Updated 11 months ago
ai-dynamo / nixl
View on GitHub
NVIDIA Inference Xfer Library (NIXL)
☆898Updated this week
ColfaxResearch / cutlass-kernels
View on GitHub
☆261Jul 11, 2024Updated last year
dalibra / RECE
View on GitHub
The repository provides code for the paper RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders, CIKM'24
☆11Oct 21, 2024Updated last year
lixiuhong / batched_gemm
View on GitHub
☆40Feb 28, 2020Updated 6 years ago
pssrawat / ppopp-artifact
View on GitHub
Artifact for 'Register Optimizations for Stencils on GPUs'
☆10Sep 18, 2018Updated 7 years ago
prajna-lang / prajna
View on GitHub
a simple general program language
☆100Feb 2, 2026Updated last month
IST-DASLab / qutlass
View on GitHub
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
☆168Nov 11, 2025Updated 3 months ago
Qualcomm-AI-research / FP8-quantization
View on GitHub
☆169Mar 9, 2023Updated 2 years ago