ROCm / Megatron-LMLinks
Ongoing research training transformer models at scale
☆29Updated last week
Alternatives and similar repositories for Megatron-LM
Users that are interested in Megatron-LM are comparing it to the libraries listed below
Sorting:
- Microsoft Collective Communication Library☆67Updated 9 months ago
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆32Updated 5 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆110Updated 3 months ago
- ☆88Updated 9 months ago
- ☆42Updated this week
- RCCL Performance Benchmark Tests☆73Updated last week
- ☆69Updated last year
- nnScaler: Compiling DNN models for Parallel Training☆118Updated this week
- Applied AI experiments and examples for PyTorch☆292Updated last week
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆261Updated last month
- ☆111Updated 8 months ago
- extensible collectives library in triton☆88Updated 5 months ago
- PyTorch library for cost-effective, fast and easy serving of MoE models.☆228Updated last month
- Fast and memory-efficient exact attention☆184Updated this week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆94Updated this week
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆209Updated last week
- ☆47Updated 8 months ago
- LLM-Inference-Bench☆50Updated last month
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆40Updated 2 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆81Updated this week
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆215Updated last year
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆79Updated 9 months ago
- ☆74Updated 5 months ago
- Fast low-bit matmul kernels in Triton☆356Updated last week
- A lightweight design for computation-communication overlap.☆160Updated last week
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆218Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsity☆81Updated 11 months ago
- Development repository for the Triton language and compiler☆127Updated this week
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆56Updated this week