runpod-workers / worker-vllmLinks

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

☆386

Alternatives and similar repositories for worker-vllm

Users that are interested in worker-vllm are comparing it to the libraries listed below

Sorting:

runpod / runpod-python
🐍 | Python library for RunPod API and serverless worker SDK.
☆258Updated 2 weeks ago
epolewski / EricLLM
A fast batching API to serve LLM models
☆189Updated last year
basetenlabs / truss-examples
Examples of models deployable with Truss
☆212Updated this week
AI-Commandos / LLaMa2lang
Convenience scripts to finetune (chat-)LLaMa3 and other models for any language
☆315Updated last year
TheBlokeAI / dockerLLM
TheBloke's Dockerfiles
☆308Updated last year
Vaibhavs10 / optimise-my-whisper
☆207Updated last year
galatolofederico / microchain
function calling-based LLM agents
☆289Updated last year
Maximilian-Winter / llama-cpp-agent
The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …
☆610Updated 9 months ago
severian42 / Vodalus-Expert-LLM-Forge
Dataset Crafting w/ RAG/Wikipedia ground truth and Efficient Fine-Tuning Using MLX and Unsloth. Includes configurable dataset annotation …
☆190Updated last year
QuixiAI / OpenChatML
☆164Updated 4 months ago
TrelisResearch / one-click-llms
One click templates for inferencing Language Models
☆221Updated 2 weeks ago
anyscale / llm-router
Tutorial for building LLM router
☆236Updated last year
cohere-ai / cohere-terrarium
A simple Python sandbox for helpful LLM data agents
☆297Updated last year
migtissera / Sensei
Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI
☆222Updated last year
matatonic / openedai-vision
An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.
☆266Updated 9 months ago
hommayushi3 / exllama-runpod-serverless
☆50Updated 2 years ago
QuixiAI / laserRMT
This is our own implementation of 'Layer Selective Rank Reduction'
☆240Updated last year
rag-wtf / open-text-embeddings
Open Source Text Embedding Models with OpenAI Compatible API
☆164Updated last year
c0sogi / llama-api
An OpenAI-like LLaMA inference API
☆113Updated 2 years ago
taprosoft / llm_finetuning
Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes…
☆146Updated 2 years ago
Preemo-Inc / text-generation-inference
☆198Updated last year
EQ-bench / EQ-Bench
A benchmark for emotional intelligence in large language models
☆389Updated last year
itsme2417 / PolyMind
A multimodal, function calling powered LLM webui.
☆217Updated last year
thomasgauthier / LoRD
Low-Rank adapter extraction for fine-tuned transformers models
☆179Updated last year
lamini-ai / lamini-examples
☆164Updated 9 months ago
Gryphe / BlockMerge_Gradient
Merge Transformers language models by use of gradient parameters.
☆209Updated last year
adrienbrault / hf-gguf-to-ollama
Dagger functions to import Hugging Face GGUF models into a local ollama instance and optionally push them to ollama.com.
☆119Updated last year
tjmlabs / AgentRun
The easiest, and fastest way to run AI-generated Python code safely
☆341Updated last year
promptslab / LLMtuner
FineTune LLMs in few lines of code (Text2Text, Text2Speech, Speech2Text)
☆246Updated last year
matt-c1 / llama-3-quant-comparison
Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.
☆165Updated last year