imbue-ai / cluster-health
☆281Updated 3 months ago
Alternatives and similar repositories for cluster-health:
Users that are interested in cluster-health are comparing it to the libraries listed below
- Zero Bubble Pipeline Parallelism☆290Updated last month
- A library to analyze PyTorch traces.☆312Updated 2 weeks ago
- NVIDIA NCCL Tests for Distributed Training☆73Updated 2 weeks ago
- ☆241Updated this week
- Pipeline Parallelism for PyTorch☆729Updated 3 months ago
- CUDA checkpoint and restore utility☆246Updated 8 months ago
- Latency and Memory Analysis of Transformer Models for Training and Inference☆358Updated last month
- This repository contains the experimental PyTorch native float8 training UX☆213Updated 4 months ago
- A PyTorch Native LLM Training Framework☆678Updated 3 months ago
- A tool for bandwidth measurements on NVIDIA GPUs.☆328Updated last month
- Applied AI experiments and examples for PyTorch☆182Updated last week
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆204Updated this week
- A throughput-oriented high-performance serving framework for LLMs☆654Updated 2 months ago
- ☆162Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆496Updated last month
- Ring attention implementation with flash attention☆606Updated last week
- Making Long-Context LLM Inference 10x Faster and 10x Cheaper☆286Updated this week
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.☆655Updated 3 months ago
- Materials for learning SGLang☆136Updated last week
- NCCL Tests☆931Updated last month
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆263Updated this week
- A low-latency & high-throughput serving engine for LLMs☆272Updated 3 months ago
- Module, Model, and Tensor Serialization/Deserialization☆196Updated 2 weeks ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆175Updated this week
- Efficient and easy multi-instance LLM serving☆245Updated this week
- Cataloging released Triton kernels.☆142Updated 3 months ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆256Updated last week
- Efficient LLM Inference over Long Sequences☆307Updated last week
- ☆112Updated 9 months ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆53Updated 2 weeks ago