sasha0552 / vllm-ciLinks
CI scripts designed to build a Pascal-compatible version of vLLM.
☆12Updated last year
Alternatives and similar repositories for vllm-ci
Users that are interested in vllm-ci are comparing it to the libraries listed below
Sorting:
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆493Updated this week
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,047Updated 3 weeks ago
- Web UI for ExLlamaV2☆513Updated 7 months ago
- A multimodal, function calling powered LLM webui.☆216Updated 11 months ago
- The main repository for building Pascal-compatible versions of ML applications and libraries.☆127Updated 3 weeks ago
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆164Updated last year
- A fork of vLLM enabling Pascal architecture GPUs☆28Updated 6 months ago
- Large-scale LLM inference engine☆1,548Updated this week
- An extension for oobabooga/text-generation-webui that enables the LLM to search the web☆269Updated this week
- A fast batching API to serve LLM models☆187Updated last year
- ☆83Updated this week
- Stable Diffusion and Flux in pure C/C++☆21Updated this week
- A pipeline parallel training script for LLMs.☆159Updated 4 months ago
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆589Updated 7 months ago
- automatically quant GGUF models☆200Updated this week
- An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.☆260Updated 6 months ago
- vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60☆247Updated this week
- Memoir+ a persona memory extension for Text Gen Web UI.☆214Updated last month
- Your Trusty Memory-enabled AI Companion - Simple RAG chatbot optimized for local LLMs | 12 Languages Supported | OpenAI API Compatible☆337Updated 6 months ago
- A daemon that automatically manages the performance states of NVIDIA GPUs.☆96Updated 2 weeks ago
- LM inference server implementation based on *.cpp.☆273Updated last month
- Make abliterated models with transformers, easy and fast☆86Updated 5 months ago
- Comparison of Language Model Inference Engines☆229Updated 9 months ago
- KoboldCpp Smart Launcher with GPU Layer and Tensor Override Tuning☆27Updated 4 months ago
- Croco.Cpp is fork of KoboldCPP infering GGML/GGUF models on CPU/Cuda with KoboldAI's UI. It's powered partly by IK_LLama.cpp, and compati…☆136Updated this week
- LLM Frontend in a single html file☆644Updated 8 months ago
- LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU vi…☆778Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆132Updated last year
- Dataset Crafting w/ RAG/Wikipedia ground truth and Efficient Fine-Tuning Using MLX and Unsloth. Includes configurable dataset annotation …☆185Updated last year
- Open source LLM UI, compatible with all local LLM providers.☆174Updated 11 months ago