ROCm / Megatron-LMLinks
Ongoing research training transformer models at scale
☆28Updated this week
Alternatives and similar repositories for Megatron-LM
Users that are interested in Megatron-LM are comparing it to the libraries listed below
Sorting:
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated 2 months ago
- Microsoft Collective Communication Library☆66Updated 10 months ago
- ☆45Updated this week
- ☆90Updated 10 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆119Updated 3 months ago
- ☆119Updated 8 months ago
- nnScaler: Compiling DNN models for Parallel Training☆118Updated 3 weeks ago
- Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.☆66Updated 6 months ago
- Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.☆93Updated last week
- ☆71Updated last year
- LLM-Inference-Bench☆51Updated 2 months ago
- AMD RAD's experimental RMA library for Triton.☆74Updated this week
- Boosting 4-bit inference kernels with 2:4 Sparsity☆82Updated last year
- A lightweight design for computation-communication overlap.☆171Updated last week
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆265Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆83Updated this week
- ☆46Updated 9 months ago
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆33Updated 3 weeks ago
- MAD (Model Automation and Dashboarding)☆25Updated this week
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆66Updated 3 months ago
- Applied AI experiments and examples for PyTorch☆296Updated last month
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆222Updated last week
- RCCL Performance Benchmark Tests☆76Updated last week
- Fast and memory-efficient exact attention☆189Updated this week
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆222Updated 2 years ago
- extensible collectives library in triton☆87Updated 5 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆212Updated this week
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆216Updated last year
- ☆56Updated this week
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆97Updated 2 weeks ago