linkedin / Liger-KernelLinks

Efficient Triton Kernels for LLM Training

☆5,842

Alternatives and similar repositories for Liger-Kernel

Users that are interested in Liger-Kernel are comparing it to the libraries listed below

Sorting:

pytorch / torchtitan
A PyTorch native platform for training generative AI models
☆4,719Updated this week
huggingface / picotron
Minimalistic 4D-parallelism distributed training framework for education purpose
☆1,892Updated 2 months ago
flashinfer-ai / flashinfer
FlashInfer: Kernel Library for LLM Serving
☆4,099Updated this week
facebookresearch / lingua
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
☆4,730Updated 4 months ago
srush / Triton-Puzzles
Puzzles for learning Triton
☆2,116Updated last year
huggingface / nanotron
Minimalistic large language model 3D-parallelism training
☆2,323Updated 2 months ago
pytorch / ao
PyTorch native quantization and sparsity for training and inference
☆2,511Updated this week
HazyResearch / ThunderKittens
Tile primitives for speedy kernels
☆2,937Updated this week
fla-org / flash-linear-attention
🚀 Efficient implementations of state-of-the-art linear attention models
☆3,886Updated this week
KellerJordan / modded-nanogpt
NanoGPT (124M) in 3 minutes
☆3,822Updated this week
mit-han-lab / llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
☆3,347Updated 4 months ago
meta-pytorch / torchtune
PyTorch native post-training library
☆5,595Updated this week
tile-ai / tilelang
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
☆3,945Updated this week
huggingface / nanoVLM
The simplest, fastest repository for training/finetuning small-sized VLMs.
☆4,294Updated 3 weeks ago
ModelTC / LightLLM
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…
☆3,730Updated this week
jiaweizzhao / GaLore
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
☆1,623Updated last year
NVIDIA / TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…
☆2,925Updated this week
policy-gradient / GRPO-Zero
Implementing DeepSeek R1's GRPO algorithm from scratch
☆1,670Updated 7 months ago
vllm-project / llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
☆2,238Updated last week
KellerJordan / Muon
Muon is an optimizer for hidden layers in neural networks
☆2,028Updated 4 months ago
mirage-project / mirage
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
☆1,951Updated this week
ridgerchu / matmulfreellm
Implementation for MatMul-free LM.
☆3,037Updated 4 months ago
gpu-mode / lectures
Material for gpu-mode lectures
☆5,310Updated last month
allenai / open-instruct
AllenAI's post-training codebase
☆3,317Updated this week
meta-pytorch / gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
☆6,152Updated 3 months ago
huggingface / datatrove
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
☆2,742Updated this week
kvcache-ai / Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
☆4,283Updated this week
gpu-mode / resource-stream
GPU programming related news and material links
☆1,795Updated 2 months ago
ai-dynamo / dynamo
A Datacenter Scale Distributed Inference Serving Framework
☆5,490Updated this week
sgl-project / sglang
SGLang is a fast serving framework for large language models and vision language models.
☆20,253Updated this week