asprenger / ray_vllm_inference
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
☆62Updated 10 months ago
Alternatives and similar repositories for ray_vllm_inference:
Users that are interested in ray_vllm_inference are comparing it to the libraries listed below
- ☆224Updated this week
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆112Updated last week
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆190Updated this week
- Self-host LLMs with vLLM and BentoML☆86Updated this week
- Benchmark suite for LLMs from Fireworks.ai☆66Updated last week
- Comparison of Language Model Inference Engines☆204Updated 2 months ago
- ☆45Updated 3 months ago
- OpenAI compatible API for TensorRT LLM triton backend☆191Updated 6 months ago
- ☆159Updated this week
- ☆53Updated 8 months ago
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.☆40Updated 7 months ago
- ☆172Updated 4 months ago
- experiments with inference on llama☆104Updated 8 months ago
- ☆17Updated last year
- ☆52Updated 5 months ago
- Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang☆35Updated 3 months ago
- A collection of all available inference solutions for the LLMs☆77Updated 5 months ago
- Data preparation code for Amber 7B LLM☆85Updated 9 months ago
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆136Updated 6 months ago
- A pipeline for LLM knowledge distillation☆89Updated 3 weeks ago
- Tune efficiently any LLM model from HuggingFace using distributed training (multiple GPU) and DeepSpeed. Uses Ray AIR to orchestrate the …☆55Updated last year
- ☆117Updated 9 months ago
- Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.☆92Updated 6 months ago
- ☆53Updated last month
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆34Updated 2 months ago
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆237Updated 11 months ago
- ☆117Updated 11 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆130Updated 7 months ago
- This is an NVIDIA AI Workbench example project that demonstrates an end-to-end model development workflow using Llamafactory.☆46Updated 4 months ago
- Using LlamaIndex with Ray for productionizing LLM applications☆71Updated last year