radixark / milesLinks

☆199

Alternatives and similar repositories for miles

Users that are interested in miles are comparing it to the libraries listed below

Sorting:

yaof20 / Flash-RL
Implementation for FP8/INT8 Rollout for RL training without performence drop.
☆269Updated 2 weeks ago
RulinShao / LightSeq
Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training
☆217Updated last year
ISEEKYAN / mbridge
Bridge Megatron-Core to Hugging Face/Reinforcement Learning
☆159Updated last week
mit-han-lab / flash-moba
☆143Updated this week
ByteDance-Seed / FlexPrefill
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆154Updated last month
RLsys-Foundation / TritonForge
🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…
☆99Updated last week
NVIDIA-NeMo / Megatron-Bridge
Training library for Megatron-based models
☆193Updated this week
Infini-AI-Lab / MagicDec
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆131Updated 11 months ago
andy-yang-1 / DoubleSparse
16-fold memory access reduction with nearly no loss
☆107Updated 7 months ago
Dao-AILab / grouped-latent-attention
☆130Updated 5 months ago
ByteDance-Seed / ByteCheckpoint
ByteCheckpoint: An Unified Checkpointing Library for LFMs
☆252Updated 4 months ago
FasterDecoding / TEAL
☆148Updated 9 months ago
fzyzcjy / torch_utils
Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatio…
☆67Updated 2 months ago
yanring / Megatron-MoE-ModelZoo
Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.
☆128Updated last week
Jingyu6 / speculative_prefill
☆46Updated 6 months ago
mit-han-lab / Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆353Updated 4 months ago
microsoft / SeerAttention
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
☆171Updated last month
opengear-project / GEAR
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
☆170Updated last year
feifeibear / Odysseus-Transformer
Odysseus: Playground of LLM Sequence Parallelism
☆78Updated last year
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆248Updated last month
rlite-project / RLite
A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…
☆81Updated 2 months ago
li-plus / flash-preference
Accelerate LLM preference tuning via prefix sharing with a single line of code
☆51Updated 4 months ago
FasterDecoding / SnapKV
☆290Updated 4 months ago
thunlp / Ouroboros
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
☆112Updated 8 months ago
XunhaoLai / native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention.
☆248Updated 5 months ago
hao-ai-lab / LookaheadReasoning
[NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning
☆52Updated 3 weeks ago
Infini-AI-Lab / MagicPIG
[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation
☆240Updated 11 months ago
tgale96 / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆130Updated 5 months ago
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆206Updated 5 months ago
fla-org / flame
🔥 A minimal training framework for scaling FLA models
☆308Updated last week