ray-project / Ray-ConnectLinks
Material for Ray Connect 2024 Conference
☆12Updated last year
Alternatives and similar repositories for Ray-Connect
Users that are interested in Ray-Connect are comparing it to the libraries listed below
Sorting:
- The driver for LMCache core to run in vLLM☆54Updated 8 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆131Updated last month
- Modular and structured prompt caching for low-latency LLM inference☆101Updated 11 months ago
- Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads. AntRay i…☆153Updated this week
- LLM Serving Performance Evaluation Harness☆79Updated 8 months ago
- Omni_Infer is a suite of inference accelerators designed for the Ascend NPU platform, offering native support and an expanding feature se…☆80Updated this week
- ☆56Updated 11 months ago
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆222Updated this week
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆104Updated last week
- ☆47Updated last year
- Stateful LLM Serving☆87Updated 7 months ago
- A low-latency & high-throughput serving engine for LLMs☆431Updated last week
- KV cache store for distributed LLM inference☆346Updated last month
- ☆121Updated last year
- TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.☆97Updated 2 years ago
- PyTorch distributed training acceleration framework☆53Updated 2 months ago
- Efficient and easy multi-instance LLM serving☆502Updated last month
- ☆97Updated 7 months ago
- Materials for learning SGLang☆618Updated 3 weeks ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆116Updated last year
- ☆58Updated last year
- JAX backend for SGL☆78Updated this week
- Toolchain built around the Megatron-LM for Distributed Training☆67Updated last week
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆113Updated 5 months ago
- Mobius is an AI infrastructure platform for distributed online learning, including online sample processing, training and serving.☆100Updated last year
- ☆26Updated 6 months ago
- ☆82Updated 11 months ago
- Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang☆58Updated 11 months ago
- Deploy ChatGLM on Modelz☆16Updated 2 years ago
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆188Updated last year