imbue-ai / cluster-health
☆295Updated 6 months ago
Alternatives and similar repositories for cluster-health:
Users that are interested in cluster-health are comparing it to the libraries listed below
- A low-latency & high-throughput serving engine for LLMs☆312Updated 3 weeks ago
- Zero Bubble Pipeline Parallelism☆336Updated last week
- Applied AI experiments and examples for PyTorch☆225Updated this week
- CUDA checkpoint and restore utility☆289Updated 3 weeks ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆91Updated last week
- GPUd automates monitoring, diagnostics, and issue identification for GPUs☆276Updated this week
- NVIDIA NCCL Tests for Distributed Training☆79Updated 3 weeks ago
- Efficient and easy multi-instance LLM serving☆295Updated this week
- ☆179Updated last week
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆222Updated this week
- A throughput-oriented high-performance serving framework for LLMs☆737Updated 5 months ago
- Ring attention implementation with flash attention☆674Updated 2 months ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆297Updated this week
- A PyTorch Native LLM Training Framework☆732Updated last month
- A library to analyze PyTorch traces.☆332Updated last week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆514Updated this week
- 10x Faster Long-Context LLM By Smart KV Cache Optimizations☆469Updated this week
- A fast communication-overlapping library for tensor parallelism on GPUs.☆296Updated 3 months ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆183Updated this week
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆187Updated this week
- A tool for bandwidth measurements on NVIDIA GPUs.☆364Updated last week
- Materials for learning SGLang☆265Updated 2 weeks ago
- Fast low-bit matmul kernels in Triton☆236Updated this week
- ring-attention experiments☆123Updated 4 months ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆290Updated this week
- PyTorch per step fault tolerance (actively under development)☆243Updated this week
- Disaggregated serving system for Large Language Models (LLMs).☆466Updated 6 months ago
- LLM KV cache compression made easy☆397Updated this week
- This repository contains the experimental PyTorch native float8 training UX☆221Updated 6 months ago
- Module, Model, and Tensor Serialization/Deserialization☆212Updated this week