intel / llm-on-rayLinks
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆131Updated last month
Alternatives and similar repositories for llm-on-ray
Users that are interested in llm-on-ray are comparing it to the libraries listed below
Sorting:
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆76Updated last year
- ☆56Updated 11 months ago
- Benchmark suite for LLMs from Fireworks.ai☆83Updated last week
- ☆57Updated last year
- The driver for LMCache core to run in vLLM☆56Updated 9 months ago
- LLM Serving Performance Evaluation Harness☆80Updated 8 months ago
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆229Updated this week
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆299Updated this week
- ☆431Updated last month
- ☆121Updated last year
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆628Updated this week
- ☆309Updated last week
- ☆48Updated last year
- ☆205Updated 6 months ago
- Efficient and easy multi-instance LLM serving☆506Updated 2 months ago
- A high-performance inference system for large language models, designed for production environments.☆482Updated this week
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆317Updated last month
- OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)☆307Updated last week
- ☆264Updated this week
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆249Updated last year
- An innovative library for efficient LLM inference via low-bit quantization☆349Updated last year
- vLLM performance dashboard☆37Updated last year
- Offline optimization of your disaggregated Dynamo graph☆105Updated this week
- Fast and memory-efficient exact attention☆97Updated last week
- ☆120Updated last year
- Common recipes to run vLLM☆214Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated last year
- A throughput-oriented high-performance serving framework for LLMs☆912Updated 2 weeks ago
- ☆97Updated 7 months ago
- A low-latency & high-throughput serving engine for LLMs☆440Updated 3 weeks ago