asprenger / ray_vllm_inferenceLinks

A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.

☆69

Alternatives and similar repositories for ray_vllm_inference

Users that are interested in ray_vllm_inference are comparing it to the libraries listed below

Sorting:

intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆128Updated 3 weeks ago
substratusai / vllm-docker
☆63Updated 4 months ago
npuichigo / openai_trtllm
OpenAI compatible API for TensorRT LLM triton backend
☆209Updated last year
bentoml / llm-bench
☆55Updated 8 months ago
bentoml / BentoVLLM
Self-host LLMs with vLLM and BentoML
☆138Updated last week
lapp0 / lm-inference-engines
Comparison of Language Model Inference Engines
☆222Updated 7 months ago
premAI-io / benchmarks
🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
☆137Updated last year
LLM-inference-router / vllm-router
vLLM Router
☆39Updated last year
backprop-ai / vllm-benchmark
Benchmarking the serving capabilities of vLLM
☆47Updated 11 months ago
mani-kantap / llm-inference-solutions
A collection of all available inference solutions for the LLMs
☆91Updated 5 months ago
triton-inference-server / vllm_backend
☆280Updated this week
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆76Updated 3 weeks ago
run-ai / runai-model-streamer
☆231Updated this week
etalab-ia / albert-models
Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.
☆42Updated last year
vectorch-ai / ScaleLLM
A high-performance inference system for large language models, designed for production environments.
☆460Updated last week
chu-tianxiang / vllm-gptq
A high-throughput and memory-efficient inference and serving engine for LLMs
☆131Updated last year
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆265Updated 9 months ago
limcheekin / open-text-embeddings
Open Source Text Embedding Models with OpenAI Compatible API
☆156Updated last year
nyunAI / PruneGPT
☆51Updated last year
vllm-project / guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
☆438Updated this week
NVIDIA / workbench-llamafactory
This is an NVIDIA AI Workbench example project that demonstrates an end-to-end model development workflow using Llamafactory.
☆62Updated 9 months ago
h2oai / enterprise-h2ogpte
Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platform
☆87Updated last month
MooreThreads / TurboRAG
☆79Updated 8 months ago
LLM360 / amber-data-prep
Data preparation code for Amber 7B LLM
☆91Updated last year
hamelsmu / llama-inference
experiments with inference on llama
☆104Updated last year
microsoft / batch-inference
Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.
☆101Updated 11 months ago
inferflow / inferflow
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
☆244Updated last year
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆87Updated this week
milvus-io / milvus-model
A library integrating embedding and reranker models from OpenAI, SentenceTransformers etc for semantic search in vector database.
☆47Updated 4 months ago
opendatahub-io / vllm-tgis-adapter
vLLM adapter for a TGIS-compatible gRPC server.
☆33Updated this week