asprenger / ray_vllm_inference
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
☆65Updated last year
Alternatives and similar repositories for ray_vllm_inference:
Users that are interested in ray_vllm_inference are comparing it to the libraries listed below
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆124Updated last week
- ☆241Updated this week
- ☆49Updated 4 months ago
- Benchmark suite for LLMs from Fireworks.ai☆70Updated 2 months ago
- OpenAI compatible API for TensorRT LLM triton backend☆204Updated 8 months ago
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆136Updated 8 months ago
- ☆57Updated 2 weeks ago
- ☆54Updated 6 months ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆248Updated this week
- ☆17Updated 2 years ago
- Comparison of Language Model Inference Engines☆212Updated 3 months ago
- ☆190Updated last week
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.☆42Updated 8 months ago
- experiments with inference on llama☆104Updated 10 months ago
- Self-host LLMs with vLLM and BentoML☆100Updated this week
- A pipeline for LLM knowledge distillation☆100Updated last week
- ☆74Updated 4 months ago
- vLLM Router☆26Updated last year
- vLLM adapter for a TGIS-compatible gRPC server.☆25Updated this week
- ☆117Updated last year
- A collection of all available inference solutions for the LLMs☆84Updated last month
- Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.☆95Updated 7 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆131Updated 9 months ago
- Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang☆47Updated 5 months ago
- Data preparation code for Amber 7B LLM☆87Updated 11 months ago
- Using LlamaIndex with Ray for productionizing LLM applications☆71Updated last year
- Open Source Text Embedding Models with OpenAI Compatible API☆151Updated 9 months ago
- ☆185Updated 6 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆262Updated 6 months ago
- ☆53Updated 10 months ago