intelligent-machine-learning / atorchLinks

An industrial extension library of pytorch to accelerate large scale model training

☆49

Alternatives and similar repositories for atorch

Users that are interested in atorch are comparing it to the libraries listed below

Sorting:

NVlabs / COAT
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
☆241Updated 2 months ago
FlagOpen / FlagAttention
A collection of memory efficient attention operators implemented in the Triton language.
☆281Updated last year
HandH1998 / QQQ
QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.
☆144Updated last month
Gaffey / ExCP
Official implementation of ICML 2024 paper "ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking".
☆48Updated last year
FlagOpen / FlagScale
FlagScale is a large model toolkit based on open-sourced projects.
☆362Updated this week
sgl-project / SpecForge
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆428Updated this week
madsys-dev / deepseekv2-profile
☆148Updated 7 months ago
fanshiqing / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆154Updated last week
feifeibear / Odysseus-Transformer
Odysseus: Playground of LLM Sequence Parallelism
☆78Updated last year
AniZpZ / AutoSmoothQuant
An easy-to-use package for implementing SmoothQuant for LLMs
☆107Updated 6 months ago
NVIDIA-NeMo / Megatron-Bridge
Training library for Megatron-based models
☆125Updated this week
InternLM / turbomind
☆96Updated 6 months ago
OpenPPL / ppl.llm.serving
☆129Updated 9 months ago
qingkelab / qingketalk
青稞Talk
☆151Updated this week
Victarry / PP-Schedule-Visualization
Pipeline Parallelism Emulation and Visualization
☆67Updated 4 months ago
xlite-dev / ffpa-attn
🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.
☆223Updated 2 months ago
MayDomine / Burst-Attention
Distributed IO-aware Attention algorithm
☆21Updated 3 weeks ago
RulinShao / LightSeq
Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training
☆216Updated last year
FlagOpen / FlagCX
☆91Updated this week
ByteDance-Seed / cudaLLM
☆120Updated 2 months ago
rlite-project / RLite
A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…
☆68Updated last month
feifeibear / long-context-attention
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
☆576Updated this week
sail-sg / zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
☆432Updated 5 months ago
ByteDance-Seed / ByteCheckpoint
ByteCheckpoint: An Unified Checkpointing Library for LFMs
☆249Updated 3 months ago
RiseAI-Sys / DAX
High performance inference engine for diffusion models
☆94Updated last month
ByteDance-Seed / decoupleQ
A quantization algorithm for LLM
☆143Updated last year
microsoft / chunk-attention
☆78Updated 6 months ago
ModelTC / EasyLLM
Built upon Megatron-Deepspeed and HuggingFace Trainer, EasyLLM has reorganized the code logic with a focus on usability. While enhancing …
☆48Updated last year
stepfun-ai / Step3
☆428Updated 2 months ago
Dao-AILab / grouped-latent-attention
☆130Updated 4 months ago