NUS-HPC-AI-Lab / oh-my-server
☆30Updated last year
Alternatives and similar repositories for oh-my-server:
Users that are interested in oh-my-server are comparing it to the libraries listed below
- ☆41Updated 2 years ago
- Accuracy 77%. Large batch deep learning optimizer LARS for ImageNet with PyTorch and ResNet, using Horovod for distribution. Optional acc…☆38Updated 3 years ago
- Performance benchmarking with ColossalAI☆39Updated 2 years ago
- ☆70Updated 3 years ago
- ☆72Updated 2 years ago
- Dynamic Tensor Rematerialization prototype (modified PyTorch) and simulator. Paper: https://arxiv.org/abs/2006.09616☆132Updated last year
- pytorch-profiler☆50Updated last year
- Odysseus: Playground of LLM Sequence Parallelism☆64Updated 7 months ago
- ☆55Updated 3 weeks ago
- nnScaler: Compiling DNN models for Parallel Training☆87Updated 3 weeks ago
- ☆97Updated 5 months ago
- Quantized Attention on GPU☆34Updated 2 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆61Updated 2 months ago
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆77Updated 2 months ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆102Updated 2 months ago
- PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021☆55Updated 3 years ago
- PyTorch implementation of LAMB for ImageNet/ResNet-50 training☆13Updated 3 years ago
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆34Updated 2 months ago
- Efficient 2:4 sparse training algorithms and implementations☆46Updated last month
- A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, …☆106Updated last year
- PyTorch bindings for CUTLASS grouped GEMM.☆86Updated 3 weeks ago
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆37Updated 2 years ago
- Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines.☆52Updated last year
- Memory footprint reduction for transformer models☆11Updated 2 years ago
- ☆26Updated 3 years ago
- Zero Bubble Pipeline Parallelism☆317Updated 2 months ago
- Python package for rematerialization-aware gradient checkpointing☆24Updated last year
- FireFlyer Record file format, writer and reader for DL training samples.☆129Updated 2 years ago
- High Performance Grouped GEMM in PyTorch☆24Updated 2 years ago
- ☆79Updated 2 months ago