TheCoreTeam / core_scheduler
CoreScheduler: A High-Performance Scheduler for Large Model Training
☆20Updated last month
Related projects: ⓘ
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆19Updated 6 months ago
- A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup☆31Updated last year
- Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…☆23Updated last year
- ☆12Updated 2 years ago
- Code associated with the paper **Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees**.☆24Updated last year
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆33Updated last year
- MobiSys#114☆21Updated last year
- [ICDCS 2023] DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining☆12Updated 9 months ago
- SOTA Learning-augmented Systems☆32Updated 2 years ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆17Updated 2 years ago
- ☆14Updated 2 years ago
- An Attention Superoptimizer☆19Updated 4 months ago
- Official Repo for "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization"☆25Updated 6 months ago
- ☆18Updated 2 years ago
- ☆19Updated last year
- [IJCAI2023] An automated parallel training system that combines the advantages from both data and model parallelism. If you have any inte…☆51Updated last year
- Create tiny ML systems for on-device learning.☆20Updated 3 years ago
- Stateful LLM Serving☆25Updated last month
- ☆65Updated 2 years ago
- [ICML 2024] Serving LLMs on heterogeneous decentralized clusters.☆14Updated 4 months ago
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆35Updated 3 months ago
- Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)☆11Updated 3 months ago
- ☆12Updated 2 months ago
- Surrogate-based Hyperparameter Tuning System☆26Updated last year
- Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines☆13Updated 9 months ago
- Cupcake: A Compression Scheduler for Scalable Communication-Efficient Distributed Training (MLSys '23)☆8Updated last year
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆92Updated 6 months ago
- ☆93Updated 8 months ago
- An external memory allocator example for PyTorch.☆13Updated 2 years ago
- Memory footprint reduction for transformer models☆11Updated last year