ROCm / Megatron-LMLinks
Ongoing research training transformer models at scale
☆31Updated this week
Alternatives and similar repositories for Megatron-LM
Users that are interested in Megatron-LM are comparing it to the libraries listed below
Sorting:
- ☆51Updated this week
- ☆93Updated last year
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆63Updated 4 months ago
- Microsoft Collective Communication Library☆66Updated 11 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆108Updated this week
- MAD (Model Automation and Dashboarding)☆29Updated last week
- ☆146Updated 10 months ago
- RCCL Performance Benchmark Tests☆78Updated last week
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆101Updated this week
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆35Updated 2 months ago
- Fast and memory-efficient exact attention☆198Updated 2 weeks ago
- ☆46Updated 10 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆126Updated 5 months ago
- extensible collectives library in triton☆90Updated 7 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆164Updated 3 weeks ago
- nnScaler: Compiling DNN models for Parallel Training☆118Updated last month
- torchcomms: a modern PyTorch communications API☆245Updated this week
- oneCCL Bindings for Pytorch* (deprecated)☆102Updated this week
- Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.☆123Updated last week
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆42Updated 3 years ago
- ☆75Updated 3 weeks ago
- AI Tensor Engine for ROCm☆296Updated this week
- Development repository for the Triton language and compiler☆136Updated this week
- Applied AI experiments and examples for PyTorch☆302Updated 2 months ago
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆80Updated 11 months ago
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training☆55Updated 3 weeks ago
- ☆60Updated this week
- Ahead of Time (AOT) Triton Math Library☆81Updated this week
- Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.☆67Updated 7 months ago
- Github mirror of trition-lang/triton repo.☆98Updated this week