sasha0552 / vllm-ciLinks
CI scripts designed to build a Pascal-compatible version of vLLM.
☆12Updated last year
Alternatives and similar repositories for vllm-ci
Users that are interested in vllm-ci are comparing it to the libraries listed below
Sorting:
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,097Updated last week
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆165Updated last year
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆597Updated this week
- The main repository for building Pascal-compatible versions of ML applications and libraries.☆156Updated 3 months ago
- Web UI for ExLlamaV2☆514Updated 10 months ago
- A fast batching API to serve LLM models☆189Updated last year
- A fork of vLLM enabling Pascal architecture GPUs☆30Updated 9 months ago
- A pipeline parallel training script for LLMs.☆164Updated 7 months ago
- SLOP Detector and analyzer based on dictionary for shareGPT JSON and text☆79Updated last week
- A multimodal, function calling powered LLM webui.☆217Updated last year
- GPU Power and Performance Manager☆62Updated last year
- ☆108Updated 3 months ago
- Comparison of Language Model Inference Engines☆237Updated last year
- An extension for oobabooga/text-generation-webui that enables the LLM to search the web☆271Updated last month
- An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.☆266Updated 9 months ago
- ☆88Updated last week
- Easily view and modify JSON datasets for large language models☆84Updated 7 months ago
- Stable Diffusion and Flux in pure C/C++☆24Updated this week
- Memoir+ a persona memory extension for Text Gen Web UI.☆221Updated 3 weeks ago
- Croco.Cpp is fork of KoboldCPP infering GGML/GGUF models on CPU/Cuda with KoboldAI's UI. It's powered partly by IK_LLama.cpp, and compati…☆154Updated last week
- automatically quant GGUF models☆218Updated last month
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆257Updated last year
- 8-bit CUDA functions for PyTorch Rocm compatible☆41Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆132Updated last year
- ☆51Updated 9 months ago
- Large-scale LLM inference engine☆1,607Updated 3 weeks ago
- Low-Rank adapter extraction for fine-tuned transformers models☆179Updated last year
- A daemon that automatically manages the performance states of NVIDIA GPUs.☆101Updated last month
- Your Trusty Memory-enabled AI Companion - Simple RAG chatbot optimized for local LLMs | 12 Languages Supported | OpenAI API Compatible☆344Updated 9 months ago
- Reinforcement Learning Toolkit for RWKV.(v6,v7,ARWKV) Distillation,SFT,RLHF(DPO,ORPO), infinite context training, Aligning. Exploring the…☆56Updated 2 months ago