TheCoreTeam / core_scheduler
CoreScheduler: A High-Performance Scheduler for Large Model Training
☆21Updated 4 months ago
Alternatives and similar repositories for core_scheduler:
Users that are interested in core_scheduler are comparing it to the libraries listed below
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆19Updated 10 months ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆21Updated last month
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆18Updated 3 years ago
- ☆36Updated this week
- ThrillerFlow is a Dataflow Analysis and Codegen Framework written in Rust.☆14Updated last month
- Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]☆22Updated last month
- [IJCAI2023] An automated parallel training system that combines the advantages from both data and model parallelism. If you have any inte…☆51Updated last year
- A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup☆34Updated 2 years ago
- Stateful LLM Serving☆44Updated 5 months ago
- ☆24Updated 3 weeks ago
- ☆12Updated 2 years ago
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, achieve peak⚡️ performance☆43Updated this week
- nnScaler: Compiling DNN models for Parallel Training☆87Updated last week
- Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines☆17Updated last year
- ☆16Updated 6 months ago
- ☆48Updated 7 months ago
- ☆18Updated 2 years ago
- Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling☆9Updated 10 months ago
- ☆34Updated 2 months ago
- ☆9Updated last year
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆24Updated last month
- ☆72Updated 2 years ago
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆27Updated 2 months ago
- ☆24Updated last year
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆16Updated this week
- ☆35Updated last month
- Official Repo for "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization"☆29Updated 10 months ago
- AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)☆80Updated last year
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Updated last year
- Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…☆23Updated 2 years ago