A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
☆78Apr 6, 2024Updated 2 years ago
Alternatives and similar repositories for ray_vllm_inference
Users that are interested in ray_vllm_inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- RLVR Testing and Training☆23Aug 28, 2025Updated 8 months ago
- RayLLM - LLMs on Ray (Archived). Read README for more info.☆1,267Mar 13, 2025Updated last year
- MLFlow Deployment Plugin for Ray Serve☆47Apr 12, 2022Updated 4 years ago
- Bootstrap curated Kubernetes stacks. Logging, metrics, ingress and more - delivered with gitops.☆14Feb 15, 2022Updated 4 years ago
- This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.☆456Feb 13, 2024Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Deprecated Browserbase Python SDK☆10Nov 1, 2024Updated last year
- LLM query engine to retrieve augmented responses from json files.☆15Oct 12, 2023Updated 2 years ago
- ☆36Apr 30, 2025Updated last year
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆130Sep 23, 2025Updated 8 months ago
- API Extensions for core KubeVela.☆14Feb 1, 2026Updated 3 months ago
- Added functionality to the cml python package☆14Apr 1, 2026Updated last month
- Codebase, data and models for the Headline Grouping paper at NAACL2021☆12Oct 2, 2022Updated 3 years ago
- A Rust crate offering similar functionality to the Python transformers package using Candle.☆14Nov 19, 2024Updated last year
- Ask Poddy: Run Open Source LLMs and Embeddings as OpenAI-Compatible Serverless Endpoints (Tutorial)☆11Jul 19, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- QLoRA: Efficient Finetuning of Quantized LLMs☆11Jul 22, 2023Updated 2 years ago
- OS Signal Handlers in Go☆11Jan 6, 2021Updated 5 years ago
- POC integration Airbyte+Dagster+Langchain☆13Jun 1, 2023Updated 2 years ago
- Build modern UIs in Jupyter with Python☆12Dec 28, 2022Updated 3 years ago
- nvidia-smi xml to json☆15May 29, 2024Updated last year
- end-to-end information extraction pipeline built by LayoutLMV2, pretrained model from HuggingFace☆11Aug 15, 2023Updated 2 years ago
- Explore Multiple Vector Databases and chat with documents on Multiple LLM models, private LLM models☆49Jun 1, 2023Updated 2 years ago
- Quora Paraphrasing Dataset Bahasa Indonesia Version☆11Apr 18, 2021Updated 5 years ago
- A demo and tutorial for Council that implements a financial analyst agent.☆11Jun 21, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- kapi provides a simplified interface to the controller-runtime library.☆26Aug 20, 2025Updated 9 months ago
- ☆10Jul 13, 2023Updated 2 years ago
- ☆91Oct 2, 2023Updated 2 years ago
- ☆14Sep 18, 2024Updated last year
- [CVPR 2026 Main] MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation☆24Updated this week
- ☆11May 22, 2021Updated 5 years ago
- [ICLR 25 Spotlight] A testbed for agents and environments that can automatically improve models through data generation.☆28Mar 4, 2025Updated last year
- 4 bits quantization of LLaMa using GPTQ☆12Jun 2, 2023Updated 2 years ago
- ☆12Jul 10, 2023Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Expose a qubes vm port to the public interfaces of the sys-net vm.☆13Jun 24, 2018Updated 7 years ago
- ☆17May 5, 2022Updated 4 years ago
- Agentkube - Run Kubernetes Like Never Before☆38Mar 1, 2026Updated 2 months ago
- Host LLM via text-generation-inference☆16Dec 5, 2023Updated 2 years ago
- ☆28Jul 29, 2025Updated 9 months ago
- Ray - A curated list of resources: https://github.com/ray-project/ray☆82Oct 21, 2025Updated 7 months ago
- 3D Mesh Generation from 2D Images in Python☆13Feb 12, 2024Updated 2 years ago