sasha0552 / vllm-ciLinks
CI scripts designed to build a Pascal-compatible version of vLLM.
☆12Updated last year
Alternatives and similar repositories for vllm-ci
Users that are interested in vllm-ci are comparing it to the libraries listed below
Sorting:
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆532Updated last week
- The main repository for building Pascal-compatible versions of ML applications and libraries.☆136Updated 2 months ago
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆165Updated last year
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,068Updated last week
- Web UI for ExLlamaV2☆510Updated 8 months ago
- A multimodal, function calling powered LLM webui.☆216Updated last year
- Comparison of Language Model Inference Engines☆231Updated 10 months ago
- An extension for oobabooga/text-generation-webui that enables the LLM to search the web☆269Updated this week
- A fast batching API to serve LLM models☆188Updated last year
- ☆83Updated 2 weeks ago
- Croco.Cpp is fork of KoboldCPP infering GGML/GGUF models on CPU/Cuda with KoboldAI's UI. It's powered partly by IK_LLama.cpp, and compati…☆149Updated this week
- Make abliterated models with transformers, easy and fast☆89Updated 6 months ago
- Large-scale LLM inference engine☆1,567Updated last week
- Stable Diffusion and Flux in pure C/C++☆21Updated last week
- A pipeline parallel training script for LLMs.☆158Updated 5 months ago
- Memoir+ a persona memory extension for Text Gen Web UI.☆216Updated 3 weeks ago
- GPU Power and Performance Manager☆60Updated last year
- Easily view and modify JSON datasets for large language models☆83Updated 5 months ago
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆249Updated last year
- automatically quant GGUF models☆212Updated last week
- A fork of vLLM enabling Pascal architecture GPUs☆29Updated 8 months ago
- SLOP Detector and analyzer based on dictionary for shareGPT JSON and text☆76Updated 11 months ago
- CHAracter State Management - a generative text adventure☆48Updated 4 months ago
- This is our own implementation of 'Layer Selective Rank Reduction'☆239Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆131Updated last year
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆721Updated last week
- ☆51Updated 8 months ago
- LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU vi…☆842Updated this week
- llama.cpp fork with additional SOTA quants and improved performance☆1,266Updated this week
- ☆55Updated last week