alibaba / hap
☆9Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for hap
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆19Updated 8 months ago
- SOTA Learning-augmented Systems☆33Updated 2 years ago
- A Deep Learning Cluster Scheduler☆37Updated 3 years ago
- Stateful LLM Serving☆38Updated 3 months ago
- DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.☆54Updated 3 months ago
- An Attention Superoptimizer☆20Updated 6 months ago
- ☆23Updated last year
- Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny model can tell you the verbosity of an LLM (…☆22Updated 5 months ago
- ☆69Updated last year
- Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“☆57Updated 5 months ago
- ☆16Updated 6 months ago
- A Cluster-Wide Model Manager to Accelerate DNN Training via Automated Training Warmup☆34Updated last year
- Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines☆14Updated 11 months ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆17Updated 2 years ago
- Code for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]☆16Updated this week
- PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications☆124Updated 2 years ago
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆19Updated last year
- Model-less Inference Serving☆82Updated last year
- Vector search with bounded performance.☆33Updated 9 months ago
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆75Updated this week
- ☆19Updated 2 years ago
- ☆46Updated 5 months ago
- Official Repo for "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization"☆27Updated 8 months ago
- A resilient distributed training framework☆85Updated 7 months ago
- An interference-aware scheduler for fine-grained GPU sharing☆111Updated 6 months ago
- ☆13Updated 2 years ago
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆57Updated 6 months ago
- ☆16Updated last year
- High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph☆48Updated 2 years ago
- Lucid: A Non-Intrusive, Scalable and Interpretable Scheduler for Deep Learning Training Jobs☆50Updated last year