NUS-HPC-AI-Lab / oh-my-server
☆30Updated last year
Related projects ⓘ
Alternatives and complementary repositories for oh-my-server
- Performance benchmarking with ColossalAI☆39Updated 2 years ago
- ☆65Updated 3 years ago
- ☆41Updated 2 years ago
- ☆39Updated 3 years ago
- Zero Bubble Pipeline Parallelism☆279Updated last week
- Accuracy 77%. Large batch deep learning optimizer LARS for ImageNet with PyTorch and ResNet, using Horovod for distribution. Optional acc…☆38Updated 3 years ago
- PyTorch bindings for CUTLASS grouped GEMM.☆67Updated 3 months ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆96Updated this week
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆75Updated last week
- Puzzles for learning Triton, play it with minimal environment configuration!☆61Updated this week
- ☆70Updated 2 years ago
- [ICML 2024] Serving LLMs on heterogeneous decentralized clusters.☆15Updated 6 months ago
- A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, …☆105Updated 10 months ago
- ☆21Updated 3 months ago
- PyTorch implementation of LAMB for ImageNet/ResNet-50 training☆14Updated 3 years ago
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆34Updated 2 years ago
- nnScaler: Compiling DNN models for Parallel Training☆64Updated 2 weeks ago
- Quantized Attention on GPU☆29Updated last week
- Odysseus: Playground of LLM Sequence Parallelism☆55Updated 4 months ago
- Dynamic Tensor Rematerialization prototype (modified PyTorch) and simulator. Paper: https://arxiv.org/abs/2006.09616☆129Updated last year
- Official implementation of ICML 2024 paper "ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking".☆41Updated 4 months ago
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆168Updated last week
- ☆88Updated 2 months ago
- pytorch-profiler☆49Updated last year
- Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines.☆44Updated 11 months ago
- [USENIX ATC '24] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Paral…☆46Updated 3 months ago
- ☆26Updated 3 years ago
- PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021☆54Updated 3 years ago
- ☆74Updated 3 weeks ago
- ATC23 AE☆43Updated last year