A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
☆77Apr 6, 2024Updated 2 years ago
Alternatives and similar repositories for ray_vllm_inference
Users that are interested in ray_vllm_inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆13May 25, 2023Updated 2 years ago
- A two part tutorial for Ray Core APIs and Ray Serve for Model Deployment☆21Jun 9, 2022Updated 3 years ago
- RayLLM - LLMs on Ray (Archived). Read README for more info.☆1,267Mar 13, 2025Updated last year
- This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.☆457Feb 13, 2024Updated 2 years ago
- LLM query engine to retrieve augmented responses from json files.☆15Oct 12, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [WIP] Transformer to embed Danbooru labelsets☆13Mar 31, 2024Updated 2 years ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆130Sep 23, 2025Updated 6 months ago
- Finetune multiple pre-trained Transformer-based models to solve Vietnamese Fake News Detection problem (ReINTEL) in VLSP2020 shared task☆18Dec 16, 2020Updated 5 years ago
- FlexOS is a Unikraft-based OS allowing users to easily specialize the safety and isolation strategy at compilation time.☆24Jun 2, 2023Updated 2 years ago
- Karmada APIs☆15Mar 10, 2026Updated last month
- A web interface for SleekDB written in PHP☆11Jan 22, 2022Updated 4 years ago
- Ask Poddy: Run Open Source LLMs and Embeddings as OpenAI-Compatible Serverless Endpoints (Tutorial)☆11Jul 19, 2024Updated last year
- QLoRA: Efficient Finetuning of Quantized LLMs☆11Jul 22, 2023Updated 2 years ago
- OpenTracing example☆14Aug 19, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Mixpost Installation with Docker Containers☆14Mar 15, 2023Updated 3 years ago
- Build modern UIs in Jupyter with Python☆12Dec 28, 2022Updated 3 years ago
- nvidia-smi xml to json☆15May 29, 2024Updated last year
- Explore Multiple Vector Databases and chat with documents on Multiple LLM models, private LLM models☆48Jun 1, 2023Updated 2 years ago
- Quora Paraphrasing Dataset Bahasa Indonesia Version☆11Apr 18, 2021Updated 4 years ago
- Github repo for Peifeng's internship project☆13Nov 7, 2023Updated 2 years ago
- Blogging with Emacs and AI☆11Jun 4, 2023Updated 2 years ago
- kapi provides a simplified interface to the controller-runtime library.☆26Aug 20, 2025Updated 7 months ago
- ☆10Sep 5, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆10Dec 12, 2023Updated 2 years ago
- My Gen AI research☆11Jun 3, 2024Updated last year
- Desktop application for instant AI-powered text transformation. Translate, correct, summarize, and change the tone of any text, anywhere,…☆30Dec 29, 2025Updated 3 months ago
- ☆14Sep 18, 2024Updated last year
- ☆11May 22, 2021Updated 4 years ago
- A testbed for agents and environments that can automatically improve models through data generation.☆28Mar 4, 2025Updated last year
- Vietnamese Drink Ordering Chatbot = Intent classification + Context Handler+ Address fuzzy matching + Facebook Built-in NLP☆17Dec 18, 2021Updated 4 years ago
- implementation of lodash library in Go 1.18 generics [WIP]☆12Apr 29, 2022Updated 3 years ago
- Benchmark suite for LLMs from Fireworks.ai☆97Updated this week
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Agentkube - Run Kubernetes Like Never Before☆37Mar 1, 2026Updated last month
- Trino On K8S Via Helm & Metastore Workshop Querying Delta Tables☆12Jan 27, 2025Updated last year
- ☆28Jul 29, 2025Updated 8 months ago
- Ray - A curated list of resources: https://github.com/ray-project/ray☆80Oct 21, 2025Updated 5 months ago
- Finetuning a codegen model with python instruction set using QLORA technique for better efficacy☆11Aug 31, 2023Updated 2 years ago
- A vllm proxy server to add security and multi model management for vllm servers☆12May 30, 2024Updated last year
- UNSUPPORTED A tool to convert and play GameMaker games in the browser☆23May 6, 2013Updated 12 years ago