inclusionAI / asystem-awexLinks
A high-performance RL training-inference weight synchronization framework, designed to enable second-level parameter updates from training to inference in RL workflows
☆131Updated last month
Alternatives and similar repositories for asystem-awex
Users that are interested in asystem-awex are comparing it to the libraries listed below
Sorting:
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆269Updated last week
- Toolchain built around the Megatron-LM for Distributed Training☆86Updated 2 months ago
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆263Updated last week
- ☆342Updated 2 weeks ago
- The driver for LMCache core to run in vLLM☆60Updated last year
- An early research stage expert-parallel load balancer for MoE models based on linear programming.☆496Updated 2 months ago
- torchcomms: a modern PyTorch communications API☆330Updated this week
- KV cache store for distributed LLM inference☆390Updated 2 months ago
- Perplexity open source garden for inference technology☆362Updated last month
- A high-performance and light-weight router for vLLM large scale deployment☆112Updated this week
- LLM Serving Performance Evaluation Harness☆83Updated 11 months ago
- ☆85Updated 3 months ago
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆69Updated last year
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆87Updated 2 weeks ago
- ☆36Updated 2 months ago
- 🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…☆116Updated 3 months ago
- Efficient and easy multi-instance LLM serving☆524Updated 5 months ago
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆209Updated last year
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆902Updated last week
- Allow torch tensor memory to be released and resumed later☆216Updated 3 weeks ago
- Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.☆858Updated this week
- ☆96Updated 10 months ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆391Updated this week
- ☆47Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆141Updated last year
- Bridge Megatron-Core to Hugging Face/Reinforcement Learning☆191Updated last week
- Accelerating MoE with IO and Tile-aware Optimizations☆569Updated 3 weeks ago
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.☆90Updated last month
- High-performance distributed data shuffling (all-to-all) library for MoE training and inference☆112Updated last month
- ☆73Updated 4 months ago