intelligent-machine-learning / atorchLinks
An industrial extension library of pytorch to accelerate large scale model training
☆36Updated last month
Alternatives and similar repositories for atorch
Users that are interested in atorch are comparing it to the libraries listed below
Sorting:
- PyTorch bindings for CUTLASS grouped GEMM.☆127Updated 5 months ago
- ☆77Updated 2 months ago
- ☆141Updated 3 months ago
- A collection of memory efficient attention operators implemented in the Triton language.☆272Updated last year
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆210Updated last week
- An easy-to-use package for implementing SmoothQuant for LLMs☆102Updated 2 months ago
- ☆97Updated 9 months ago
- nnScaler: Compiling DNN models for Parallel Training☆113Updated last week
- ☆87Updated 3 months ago
- Zero Bubble Pipeline Parallelism☆399Updated last month
- PyTorch bindings for CUTLASS grouped GEMM.☆100Updated 3 weeks ago
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆219Updated 2 months ago
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆210Updated 10 months ago
- UltraScale Playbook 中文版☆43Updated 3 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆70Updated last year
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆100Updated last year
- [USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…☆57Updated 10 months ago
- Official implementation of ICML 2024 paper "ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking".☆48Updated 11 months ago
- A lightweight design for computation-communication overlap.☆143Updated last week
- ☆139Updated last year
- ☆105Updated 10 months ago
- ☆128Updated 6 months ago
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆255Updated 3 months ago
- Pipeline Parallelism Emulation and Visualization☆43Updated 2 weeks ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆297Updated 7 months ago
- Summary of system papers/frameworks/codes/tools on training or serving large model☆57Updated last year
- ☆84Updated 3 years ago
- QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.☆128Updated 2 months ago
- ☆36Updated 10 months ago
- Summary of some awesome work for optimizing LLM inference☆77Updated 3 weeks ago