intel / llm-on-rayLinks

Pretrain, finetune and serve LLMs on Intel platforms with Ray

☆128

Alternatives and similar repositories for llm-on-ray

Users that are interested in llm-on-ray are comparing it to the libraries listed below

Sorting:

bentoml / llm-bench
☆55Updated 8 months ago
project-etalon / etalon
LLM Serving Performance Evaluation Harness
☆79Updated 5 months ago
LMCache / lmcache-vllm
The driver for LMCache core to run in vLLM
☆45Updated 5 months ago
asprenger / ray_vllm_inference
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
☆69Updated last year
run-ai / llmperf
☆58Updated 10 months ago
anyscale / llm-continuous-batching-benchmarks
☆120Updated last year
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆76Updated 3 weeks ago
triton-inference-server / vllm_backend
☆280Updated this week
tyler-griggs / melange-release
☆47Updated last year
intel / xFasterTransformer
☆429Updated last week
sgl-project / genai-bench
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆180Updated this week
yale-sys / prompt-cache
Modular and structured prompt caching for low-latency LLM inference
☆97Updated 8 months ago
vectorch-ai / ScaleLLM
A high-performance inference system for large language models, designed for production environments.
☆460Updated last week
inferflow / inferflow
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
☆244Updated last year
neuralmagic / AutoFP8
☆195Updated 2 months ago
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆87Updated this week
AdrianBZG / LLM-distributed-finetune
Tune efficiently any LLM model from HuggingFace using distributed training (multiple GPU) and DeepSpeed. Uses Ray AIR to orchestrate the …
☆59Updated 2 years ago
snowflakedb / ArcticInference
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆198Updated this week
vllm-project / dashboard
vLLM performance dashboard
☆32Updated last year
huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆307Updated 2 months ago
sgl-project / sgl-learning-materials
Materials for learning SGLang
☆515Updated 2 weeks ago
microsoft / sarathi-serve
A low-latency & high-throughput serving engine for LLMs
☆397Updated 2 months ago
vllm-project / recipes
Common recipes to run vLLM
☆68Updated this week
vllm-project / flash-attention
Fast and memory-efficient exact attention
☆82Updated this week
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆265Updated 9 months ago
sgl-project / ome
OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)
☆202Updated this week
InternLM / turbomind
☆92Updated 4 months ago
HabanaAI / vllm-fork
A high-throughput and memory-efficient inference and serving engine for LLMs
☆78Updated this week
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆349Updated 11 months ago
AlibabaPAI / llumnix
Efficient and easy multi-instance LLM serving
☆454Updated this week