ray-project / Ray-ConnectLinks
Material for Ray Connect 2024 Conference
☆11Updated 11 months ago
Alternatives and similar repositories for Ray-Connect
Users that are interested in Ray-Connect are comparing it to the libraries listed below
Sorting:
- The driver for LMCache core to run in vLLM☆51Updated 7 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆132Updated this week
- Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads. AntRay i…☆145Updated last week
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆113Updated last year
- ☆59Updated last year
- TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.☆95Updated 2 years ago
- Modular and structured prompt caching for low-latency LLM inference☆100Updated 10 months ago
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆129Updated last year
- Stateful LLM Serving☆84Updated 6 months ago
- Mobius is an AI infrastructure platform for distributed online learning, including online sample processing, training and serving.☆99Updated last year
- PyTorch distributed training acceleration framework☆52Updated last month
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆211Updated 3 weeks ago
- LLM Serving Performance Evaluation Harness☆79Updated 7 months ago
- Omni_Infer is a suite of inference accelerators designed for the Ascend NPU platform, offering native support and an expanding feature se…☆71Updated this week
- Efficient and easy multi-instance LLM serving☆487Updated 3 weeks ago
- LLM Inference benchmark☆426Updated last year
- GLake: optimizing GPU memory management and IO transmission.☆480Updated 6 months ago
- ☆24Updated 5 months ago
- ☆218Updated 2 years ago
- ☆55Updated 10 months ago
- KV cache store for distributed LLM inference☆335Updated 2 weeks ago
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆183Updated last year
- ☆47Updated last year
- ☆130Updated 11 months ago
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆108Updated 4 months ago
- ☆95Updated 6 months ago
- OpenEmbedding is an open source framework for Tensorflow distributed training acceleration.☆33Updated 2 years ago
- A high-performance serving system for DeepRec based on TensorFlow Serving.☆19Updated last year
- A low-latency & high-throughput serving engine for LLMs☆418Updated 3 months ago
- LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model☆44Updated 3 weeks ago