BaguaSys / tutorials
Bagua tutorials.
☆12Updated 2 years ago
Alternatives and similar repositories for tutorials:
Users that are interested in tutorials are comparing it to the libraries listed below
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆63Updated 2 years ago
- pytorch code examples for measuring the performance of collective communication calls in AI workloads☆12Updated 2 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated 10 months ago
- High performance NCCL plugin for Bagua.☆15Updated 3 years ago
- An IR for efficiently simulating distributed ML computation.☆25Updated last year
- ☆28Updated 3 weeks ago
- ☆23Updated last month
- Artifacts for SOSP'19 paper Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions☆21Updated 2 years ago
- ☆38Updated last year
- ☆57Updated 7 months ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆73Updated last year
- Distributed ML Optimizer☆30Updated 3 years ago
- An Attention Superoptimizer☆20Updated 8 months ago
- Code for Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture (accepted by PVLDB).The outdated wr…☆8Updated last year
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆39Updated 2 months ago
- A Python script to convert the output of NVIDIA Nsight Systems (in SQLite format) to JSON in Google Chrome Trace Event Format.☆27Updated 4 months ago
- ☆23Updated this week
- PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021☆55Updated 3 years ago
- ☆11Updated 3 years ago
- ☆22Updated 5 years ago
- TensorRT LLM Benchmark Configuration☆12Updated 5 months ago
- DLPack for Tensorflow☆36Updated 4 years ago
- CUDA 12.2 HMM demos☆20Updated 5 months ago
- Ahead of Time (AOT) Triton Math Library☆50Updated this week
- ☆48Updated 10 months ago
- MLPerf™ logging library☆32Updated last week
- Cavs: An Efficient Runtime System for Dynamic Neural Networks☆13Updated 4 years ago
- gossip: Efficient Communication Primitives for Multi-GPU Systems☆58Updated 2 years ago
- Development repository for integrating FlexFlow (A distributed deep learning framework that supports flexible parallelization strategies)…☆28Updated 3 years ago