runpod-workers / worker-vllm
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
☆307Updated this week
Alternatives and similar repositories for worker-vllm:
Users that are interested in worker-vllm are comparing it to the libraries listed below
- A fast batching API to serve LLM models☆182Updated 11 months ago
- Low-Rank adapter extraction for fine-tuned transformers models☆171Updated 11 months ago
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆552Updated 2 months ago
- Customizable implementation of the self-instruct paper.☆1,043Updated last year
- Large-scale LLM inference engine☆1,395Updated this week
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI☆223Updated 11 months ago
- TheBloke's Dockerfiles☆303Updated last year
- Tutorial for building LLM router☆193Updated 9 months ago
- Convenience scripts to finetune (chat-)LLaMa3 and other models for any language☆304Updated 10 months ago
- ☆204Updated 10 months ago
- A Lightweight Library for AI Observability☆241Updated 2 months ago
- Examples of models deployable with Truss☆169Updated this week
- This is our own implementation of 'Layer Selective Rank Reduction'☆235Updated 11 months ago
- Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes…☆147Updated last year
- ☆199Updated last year
- Dataset Crafting w/ RAG/Wikipedia ground truth and Efficient Fine-Tuning Using MLX and Unsloth. Includes configurable dataset annotation …