runpod-workers / worker-vllm
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
☆313Updated last week
Alternatives and similar repositories for worker-vllm
Users that are interested in worker-vllm are comparing it to the libraries listed below
Sorting:
- A fast batching API to serve LLM models☆182Updated last year
- TheBloke's Dockerfiles☆303Updated last year
- Large-scale LLM inference engine☆1,419Updated this week
- ☆52Updated last year
- This is our own implementation of 'Layer Selective Rank Reduction'☆238Updated 11 months ago
- A multimodal, function calling powered LLM webui.☆214Updated 7 months ago
- ☆199Updated last year
- Open Source Text Embedding Models with OpenAI Compatible API☆153Updated 10 months ago
- OpenAI compatible API for TensorRT LLM triton backend☆205Updated 9 months ago
- Low-Rank adapter extraction for fine-tuned transformers models☆173Updated last year
- 🐍 | Python library for RunPod API and serverless worker SDK.☆228Updated last month
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆562Updated 3 months ago
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI☆223Updated last year
- ☆156Updated 10 months ago
- Merge Transformers language models by use of gradient parameters.☆208Updated 9 months ago
- Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hub☆162Updated last year
- Dataset Crafting w/ RAG/Wikipedia ground truth and Efficient Fine-Tuning Using MLX and Unsloth. Includes configurable dataset annotation …☆183Updated 9 months ago
- ☆54Updated last year
- An OpenAI-like LLaMA inference API☆112Updated last year
- Customizable implementation of the self-instruct paper.☆1,043Updated last year
- function calling-based LLM agents☆285Updated 8 months ago
- Dagger functions to import Hugging Face GGUF models into a local ollama instance and optionally push them to ollama.com.☆115Updated 11 months ago
- Web UI for ExLlamaV2☆496Updated 3 months ago
- ☆130Updated 2 weeks ago
- The easiest, and fastest way to run AI-generated Python code safely☆308Updated 5 months ago
- A OpenAI API compatible REST server for llama.☆207Updated 2 months ago
- Falcon LLM ggml framework with CPU and GPU support☆246Updated last year
- ☆204Updated 11 months ago
- This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM.☆304Updated last month
- Examples of models deployable with Truss☆170Updated this week