liangyuwang / Tiny-MegatronLinks

Tiny-Megatron, a minimalistic re-implementation of the Megatron library

☆17

Alternatives and similar repositories for Tiny-Megatron

Users that are interested in Tiny-Megatron are comparing it to the libraries listed below

Sorting:

dhcode-cpp / NSA-pytorch
DeepSeek Native Sparse Attention pytorch implementation
☆107Updated this week
attention-survey / Efficient_Attention_Survey
A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention
☆221Updated 2 months ago
ifromeast / AI_analysis
analyse problems of AI with Math and Code
☆27Updated 3 months ago
mdy666 / mdy_triton
☆148Updated 4 months ago
InternLM / Awesome-LLM-Training-System
☆43Updated last year
luliyucoordinate / cute-flash-attention
Implement Flash Attention using Cute.
☆96Updated 10 months ago
FFY0 / AdaKV
The Official Implementation of Ada-KV [NeurIPS 2025]
☆110Updated last month
TransferQueue / TransferQueue
☆44Updated this week
smart-lty / ParallelSpeculativeDecoding
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆128Updated 2 weeks ago
liangyuwang / Tiny-DeepSpeed
Tiny-DeepSpeed, a minimalistic re-implementation of the DeepSpeed library
☆48Updated 2 months ago
PKUFlyingPig / MIT6.5940_TinyML
Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing
☆60Updated 10 months ago
PKU-SEC-Lab / HybriMoE
[DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"
☆87Updated 5 months ago
hao-ai-lab / vllm-ltr
[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank
☆63Updated last year
Gaffey / ExCP
Official implementation of ICML 2024 paper "ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking".
☆48Updated last year
pprp / Awesome-Efficient-MoE
Efficient Mixture of Experts for LLM Paper List
☆143Updated last month
zhuzilin / flash-attention-with-sink
☆39Updated 3 months ago
shishishu / LLM-Inference-Acceleration
LLM Inference with Deep Learning Accelerator.
☆53Updated 9 months ago
tile-ai / AttentionEngine
☆50Updated 5 months ago
mdy666 / Qwen-Native-Sparse-Attention
qwen-nsa
☆83Updated last month
andy-yang-1 / DoubleSparse
16-fold memory access reduction with nearly no loss
☆106Updated 7 months ago
DeepLink-org / DLSlime
DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit
☆78Updated this week
madsys-dev / deepseekv2-profile
☆149Updated 8 months ago
thu-pacman / SmartMoE-AE
ATC23 AE
☆47Updated 2 years ago
sii-research / siiRL
siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Agent Systems
☆224Updated this week
RLsys-Foundation / APRIL
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM tra…
☆41Updated last month
yaof20 / Flash-RL
Implementation for FP8/INT8 Rollout for RL training without performence drop.
☆268Updated last week
mit-han-lab / Block-Sparse-Attention
A sparse attention kernel supporting mix sparse patterns
☆366Updated 9 months ago
ByteDance-Seed / FlexPrefill
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆152Updated last month
ByteDance-Seed / cudaLLM
☆120Updated 2 months ago
infinigence / HamiltonAttention
☆31Updated 3 weeks ago