dlsyscourse / hw1
☆6Updated last week
Related projects: ⓘ
- ☆21Updated 4 months ago
- Surrogate-based Hyperparameter Tuning System☆26Updated last year
- ☆37Updated 3 years ago
- ☆18Updated last week
- SOTA Learning-augmented Systems☆32Updated 2 years ago
- TensorRT LLM Benchmark Configuration☆10Updated last month
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆19Updated 6 months ago
- ☆14Updated last year
- [ICDCS 2023] DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining☆12Updated 9 months ago
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆33Updated last year
- GPTQ inference TVM kernel☆35Updated 4 months ago
- ☆13Updated 2 years ago
- Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and o…☆38Updated last month
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆35Updated 3 months ago
- Simple PyTorch graph capturing.☆13Updated last year
- Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training☆18Updated 6 months ago
- DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.☆46Updated 3 weeks ago
- [ICML 2024] Serving LLMs on heterogeneous decentralized clusters.☆14Updated 4 months ago
- AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)☆76Updated last year
- Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines☆13Updated 9 months ago
- ☆19Updated 2 years ago
- A resilient distributed training framework☆78Updated 5 months ago
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆18Updated last year
- Penn CIS 5650 (GPU Programming and Architecture) Final Project☆21Updated 9 months ago
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆153Updated this week
- ☆15Updated this week
- ☆29Updated last month
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆17Updated 2 years ago
- Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines.☆41Updated 9 months ago
- ☆65Updated 2 years ago