runpod-workers / worker-vllm
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
☆287Updated last week
Alternatives and similar repositories for worker-vllm:
Users that are interested in worker-vllm are comparing it to the libraries listed below
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI☆223Updated 10 months ago
- A fast batching API to serve LLM models☆180Updated 10 months ago
- TheBloke's Dockerfiles☆305Updated 11 months ago
- A multimodal, function calling powered LLM webui.☆215Updated 5 months ago
- This is our own implementation of 'Layer Selective Rank Reduction'☆233Updated 9 months ago
- Examples of models deployable with Truss☆161Updated this week
- Low-Rank adapter extraction for fine-tuned transformers models☆170Updated 10 months ago
- A bagel, with everything.☆316Updated 10 months ago
- Dataset Crafting w/ RAG/Wikipedia ground truth and Efficient Fine-Tuning Using MLX and Unsloth. Includes configurable dataset annotation …☆176Updated 7 months ago
- Falcon LLM ggml framework with CPU and GPU support☆246Updated last year
- 🐍 | Python library for RunPod API and serverless worker SDK.☆212Updated last month
- function calling-based LLM agents☆283Updated 5 months ago
- ☆152Updated 7 months ago
- ☆124Updated last month
- ☆199Updated 9 months ago
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆538Updated 2 weeks ago
- An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.☆226Updated 2 months ago
- ☆111Updated 2 months ago
- Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA☆123Updated last year
- Local LLM ReAct Agent with Guidance☆156Updated last year
- Web UI for ExLlamaV2☆484Updated 3 weeks ago
- The code we currently use to fine-tune models.☆113Updated 9 months ago
- idea: https://github.com/nyxkrage/ebook-groupchat/☆86Updated 6 months ago
- Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hub☆157Updated last year
- ☆168Updated last year
- automatically quant GGUF models☆156Updated last week
- Large-scale LLM inference engine☆1,314Updated this week
- Fine-tuning LLMs using QLoRA☆249Updated 8 months ago
- Merge Transformers language models by use of gradient parameters.☆205Updated 6 months ago
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆146Updated 9 months ago