microsoft / varuna
β248Updated 8 months ago
Alternatives and similar repositories for varuna:
Users that are interested in varuna are comparing it to the libraries listed below
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mindβ¦β155Updated 3 months ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β190Updated this week
- Implementation of a Transformer, but completely in Tritonβ263Updated 2 years ago
- Research and development for optimizing transformersβ125Updated 4 years ago
- Applied AI experiments and examples for PyTorchβ251Updated last week
- A library to analyze PyTorch traces.β355Updated this week
- This repository contains the experimental PyTorch native float8 training UXβ222Updated 8 months ago
- Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformersβ207Updated 7 months ago
- β102Updated 7 months ago
- A schedule language for large model trainingβ144Updated 9 months ago
- Triton-based implementation of Sparse Mixture of Experts.β209Updated 4 months ago
- β72Updated 3 years ago
- β192Updated this week
- Fast low-bit matmul kernels in Tritonβ275Updated this week
- Torch Distributed Experimentalβ115Updated 7 months ago
- extensible collectives library in tritonβ84Updated this week
- Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)β116Updated 3 years ago
- Cataloging released Triton kernels.β212Updated 2 months ago
- Latency and Memory Analysis of Transformer Models for Training and Inferenceβ402Updated 3 weeks ago
- β73Updated 4 months ago
- FTPipe and related pipeline model parallelism research.β41Updated last year
- β93Updated 2 years ago
- Collection of kernels written in Triton languageβ117Updated last month
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsityβ202Updated last year
- A tensor-aware point-to-point communication primitive for machine learningβ255Updated 2 years ago
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.β38Updated 2 years ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the β¦β131Updated last week
- β116Updated last year
- Fast Matrix Multiplications for Lookup Table-Quantized LLMsβ236Updated last month
- Training neural networks in TensorFlow 2.0 with 5x less memoryβ129Updated 3 years ago