DS3Lab / AC-SGD
Code associated with the paper **Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees**.
☆24Updated last year
Related projects: ⓘ
- PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021☆54Updated 3 years ago
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference☆21Updated 3 months ago
- The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"☆42Updated last year
- Official Repo for "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization"☆25Updated 6 months ago
- AN EFFICIENT AND GENERAL FRAMEWORK FOR LAYERWISE-ADAPTIVE GRADIENT COMPRESSION☆10Updated 10 months ago
- ☆44Updated 11 months ago
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆33Updated last year
- Rotation and Permutation for Advanced Outlier Management and Efficient Quantization of LLMs☆24Updated last week
- PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".☆70Updated last year
- Official implementation of Neurips 2020 "Sparse Weight Activation Training" paper.☆26Updated 3 years ago
- ☆22Updated 3 years ago
- You Only Search Once: On Lightweight Differentiable Architecture Search for Resource-Constrained Embedded Platforms☆10Updated last year
- ☆19Updated last year
- ☆39Updated last year
- CoreScheduler: A High-Performance Scheduler for Large Model Training☆20Updated last month
- [ICDCS 2023] DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining☆12Updated 9 months ago
- ☆17Updated last year
- [ICLR 2023] "Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!" Shiwei Liu, Tianlong Chen, Zhenyu Zhang, Xuxi Chen…☆27Updated last year
- Memory footprint reduction for transformer models☆11Updated last year
- BitPack is a practical tool to efficiently save ultra-low precision/mixed-precision quantized models.☆49Updated last year
- An external memory allocator example for PyTorch.☆13Updated 2 years ago
- 16-fold memory access reduction with nearly no loss☆35Updated last month
- Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines☆13Updated 9 months ago
- Python package for rematerialization-aware gradient checkpointing☆22Updated 10 months ago
- ☆42Updated 7 months ago
- This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs☆28Updated last month
- ☆90Updated 2 years ago
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆19Updated 6 months ago
- ☆38Updated 3 years ago
- ☆65Updated 2 years ago