ROCm / Megatron-LMLinks
Ongoing research training transformer models at scale
☆25Updated last week
Alternatives and similar repositories for Megatron-LM
Users that are interested in Megatron-LM are comparing it to the libraries listed below
Sorting:
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆61Updated 2 weeks ago
- Microsoft Collective Communication Library☆64Updated 7 months ago
- ☆83Updated 8 months ago
- ☆40Updated this week
- ☆21Updated last week
- RCCL Performance Benchmark Tests☆70Updated this week
- ☆100Updated 6 months ago
- ☆37Updated 7 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆101Updated last month
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆255Updated 8 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆77Updated this week
- ☆64Updated last year
- A lightweight design for computation-communication overlap.☆148Updated 3 weeks ago
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆40Updated 2 years ago
- LLM-Inference-Bench☆45Updated last month
- A hierarchical collective communications library with portable optimizations☆35Updated 7 months ago
- DeepSeek-V3/R1 inference performance simulator☆155Updated 3 months ago
- nnScaler: Compiling DNN models for Parallel Training☆113Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆85Updated this week
- LLM Inference analyzer for different hardware platforms☆80Updated last week
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆30Updated 4 months ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆189Updated this week
- Fast and memory-efficient exact attention☆177Updated this week
- extensible collectives library in triton☆87Updated 3 months ago
- A CUTLASS implementation using SYCL☆31Updated this week
- ☆106Updated 8 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆206Updated this week
- ☆225Updated last week
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆216Updated last year
- Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.☆67Updated 3 months ago