RulinShao / FastCkptLinks

Python package for rematerialization-aware gradient checkpointing

☆27

Alternatives and similar repositories for FastCkpt

Users that are interested in FastCkpt are comparing it to the libraries listed below

Sorting:

zhuohan123 / terapipe
☆75Updated 4 years ago
RulinShao / LightSeq
Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training
☆216Updated last year
zhengzangw / Sequence-Scheduling
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
☆92Updated 2 years ago
DachengLi1 / AMP
(NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.
☆41Updated 2 years ago
thu-pacman / FasterMoE
☆87Updated 3 years ago
DS3Lab / DT-FM
☆93Updated 3 years ago
SymbioticLab / Oobleck
A resilient distributed training framework
☆96Updated last year
stanford-futuredata / stk
☆112Updated last year
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆246Updated 3 weeks ago
exists-forall / striped_attention
☆41Updated last year
LiuXiaoxuanPKU / OSD
☆60Updated 10 months ago
saareliad / FTPipe
FTPipe and related pipeline model parallelism research.
☆43Updated 2 years ago
xuqifan897 / Optimus
☆28Updated 4 years ago
Infini-AI-Lab / Sirius
Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its…
☆21Updated last year
fzyzcjy / torch_utils
Utility scripts for PyTorch (e.g. Memory profiler that understands more low-level allocations such as NCCL)
☆62Updated last month
kssteven418 / BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
☆94Updated last year
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆117Updated last month
AlibabaResearch / flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆221Updated 2 years ago
yanring / Megatron-MoE-ModelZoo
Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.
☆114Updated 2 weeks ago
ISEEKYAN / mbridge
Bridge Megatron-Core to Hugging Face/Reinforcement Learning
☆142Updated last week
opengear-project / GEAR
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
☆169Updated last year
DS3Lab / Decentralized_FM_alpha
☆19Updated 2 years ago
MDK8888 / vllmini
A minimal implementation of vllm.
☆60Updated last year
andy-yang-1 / DoubleSparse
16-fold memory access reduction with nearly no loss
☆104Updated 7 months ago
awslabs / slapo
A schedule language for large model training
☆151Updated 2 months ago
hao-ai-lab / vllm-ltr
[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank
☆61Updated 11 months ago
google / iopddl
Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning
☆23Updated 5 months ago
hao-ai-lab / MuxServe
☆74Updated 2 weeks ago
MayDomine / Seq1F1B
Sequence-level 1F1B schedule for LLMs.
☆18Updated last year
LoongServe / LoongServe
☆124Updated 11 months ago