cisco-open / pymultiworldLinks
A framework for PyTorch to enable fault management for collective communication libraries (CCL) such as NCCL
☆19Updated last week
Alternatives and similar repositories for pymultiworld
Users that are interested in pymultiworld are comparing it to the libraries listed below
Sorting:
- Triton-based implementation of Sparse Mixture of Experts.☆233Updated 8 months ago
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆237Updated last month
- Manage ML configuration with pydantic☆16Updated 3 months ago
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆260Updated this week
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆64Updated 7 months ago
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆190Updated this week
- extensible collectives library in triton☆88Updated 4 months ago
- Applied AI experiments and examples for PyTorch☆290Updated 2 months ago
- ring-attention experiments☆149Updated 10 months ago
- Hydragen: High-Throughput LLM Inference with Shared Prefixes☆41Updated last year
- A resilient distributed training framework☆94Updated last year
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆260Updated 3 weeks ago
- PyTorch Single Controller☆361Updated last week
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆176Updated 11 months ago
- Allow torch tensor memory to be released and resumed later☆109Updated last week
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆208Updated last week
- PyTorch bindings for CUTLASS grouped GEMM.☆109Updated 2 months ago
- Triton-based Symmetric Memory operators and examples☆23Updated this week
- Toolchain built around the Megatron-LM for Distributed Training☆58Updated last week
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆79Updated 9 months ago
- A minimal implementation of vllm.☆51Updated last year
- ☆110Updated 11 months ago
- Collection of kernels written in Triton language☆145Updated 4 months ago
- Bridge Megatron-Core to Hugging Face/Reinforcement Learning☆93Updated last week
- A simple calculation for LLM MFU.☆43Updated 5 months ago
- A Quirky Assortment of CuTe Kernels☆407Updated this week
- Python package for rematerialization-aware gradient checkpointing☆25Updated last year
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆214Updated last year
- DeeperGEMM: crazy optimized version☆71Updated 3 months ago
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆40Updated 2 years ago