microsoft / varuna
☆245Updated 6 months ago
Alternatives and similar repositories for varuna:
Users that are interested in varuna are comparing it to the libraries listed below
- Implementation of a Transformer, but completely in Triton☆253Updated 2 years ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆153Updated last month
- Research and development for optimizing transformers☆125Updated 3 years ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆186Updated last week
- ☆97Updated 5 months ago
- Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers☆204Updated 5 months ago
- Latency and Memory Analysis of Transformer Models for Training and Inference☆378Updated 2 months ago
- Applied AI experiments and examples for PyTorch☆215Updated last week
- This repository contains the experimental PyTorch native float8 training UX☆219Updated 5 months ago
- A library to analyze PyTorch traces.☆325Updated this week
- ☆70Updated 3 years ago
- Torch Distributed Experimental☆115Updated 5 months ago
- A schedule language for large model training☆143Updated 7 months ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆86Updated last week
- Triton-based implementation of Sparse Mixture of Experts.☆194Updated 2 months ago
- Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)☆116Updated 3 years ago
- ☆114Updated 10 months ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆102Updated 2 months ago
- Pipeline Parallelism for PyTorch☆739Updated 5 months ago
- extensible collectives library in triton☆77Updated 4 months ago
- ☆278Updated last week
- Explorations into some recent techniques surrounding speculative decoding☆233Updated last month
- ☆156Updated 7 months ago
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆216Updated this week
- Fast low-bit matmul kernels in Triton☆199Updated last week
- ☆157Updated last year
- Cataloging released Triton kernels.☆157Updated 2 weeks ago
- Zero Bubble Pipeline Parallelism☆317Updated 2 months ago
- Large Context Attention☆677Updated this week
- ☆92Updated 2 years ago