intelligent-machine-learning / atorchLinks
An industrial extension library of pytorch to accelerate large scale model training
☆39Updated this week
Alternatives and similar repositories for atorch
Users that are interested in atorch are comparing it to the libraries listed below
Sorting:
- A collection of memory efficient attention operators implemented in the Triton language.☆276Updated last year
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆234Updated last month
- PyTorch bindings for CUTLASS grouped GEMM.☆134Updated last month
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆222Updated last week
- Zero Bubble Pipeline Parallelism☆417Updated 3 months ago
- FlagScale is a large model toolkit based on open-sourced projects.☆338Updated this week
- ☆145Updated 5 months ago
- Pipeline Parallelism Emulation and Visualization☆57Updated 2 months ago
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆294Updated this week
- Distributed IO-aware Attention algorithm☆21Updated 11 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆76Updated last year
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆213Updated 11 months ago
- ☆92Updated 4 months ago
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆546Updated 3 weeks ago
- ☆270Updated this week
- UltraScale Playbook 中文版☆64Updated 5 months ago
- Official implementation of ICML 2024 paper "ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking".☆48Updated last year
- QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.☆138Updated 4 months ago
- Toolchain built around the Megatron-LM for Distributed Training☆58Updated last week
- ☆123Updated 2 months ago
- An easy-to-use package for implementing SmoothQuant for LLMs☆104Updated 4 months ago
- Allow torch tensor memory to be released and resumed later☆106Updated this week
- PyTorch bindings for CUTLASS grouped GEMM.☆108Updated 2 months ago
- ☆128Updated 7 months ago
- ☆78Updated 3 months ago
- Materials for learning SGLang☆525Updated 3 weeks ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆202Updated this week
- ☆389Updated this week
- 青稞Talk☆68Updated this week
- Fast and memory-efficient exact attention☆86Updated last week