imbue-ai / cluster-health
☆225Updated 3 weeks ago
Related projects: ⓘ
- A library to analyze PyTorch traces.☆270Updated last week
- NVIDIA NCCL Tests for Distributed Training☆59Updated last month
- CUDA checkpoint and restore utility☆193Updated 5 months ago
- Zero Bubble Pipeline Parallelism☆254Updated 2 weeks ago
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆150Updated last week
- ☆158Updated this week
- Applied AI experiments and examples for PyTorch☆123Updated last month
- A throughput-oriented high-performance serving framework for LLMs☆470Updated this week
- Latency and Memory Analysis of Transformer Models for Training and Inference☆338Updated 3 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆144Updated this week
- A low-latency & high-throughput serving engine for LLMs☆174Updated last week
- This repository contains the experimental PyTorch native float8 training UX☆210Updated last month
- A PyTorch Native LLM Training Framework☆581Updated 3 weeks ago
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆416Updated this week
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.☆562Updated 2 weeks ago
- A large-scale simulation framework for LLM inference☆236Updated 3 weeks ago
- Microsoft Automatic Mixed Precision Library☆507Updated this week
- Efficient serverless deployment for large AI models.☆178Updated this week
- Ring attention implementation with flash attention☆529Updated this week
- ☆145Updated last month
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆95Updated this week
- OpenAI compatible API for TensorRT LLM triton backend☆148Updated last month
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆452Updated last week
- Pipeline Parallelism for PyTorch☆708Updated 3 weeks ago
- FlashInfer: Kernel Library for LLM Serving☆1,143Updated this week
- ☆108Updated 6 months ago
- Disaggregated serving system for Large Language Models (LLMs).☆278Updated last month
- ☆32Updated this week
- ☆170Updated this week
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆145Updated this week