sasha0552 / vllm-ciLinks
CI scripts designed to build a Pascal-compatible version of vLLM.
☆12Updated 10 months ago
Alternatives and similar repositories for vllm-ci
Users that are interested in vllm-ci are comparing it to the libraries listed below
Sorting:
- The main repository for building Pascal-compatible versions of ML applications and libraries.☆95Updated last month
- A fork of vLLM enabling Pascal architecture GPUs☆28Updated 4 months ago
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆154Updated last year
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆419Updated last week
- OpenAI compatible API for TensorRT LLM triton backend☆209Updated 10 months ago
- LM inference server implementation based on *.cpp.☆226Updated this week
- An extension for oobabooga/text-generation-webui that enables the LLM to search the web using DuckDuckGo☆249Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆131Updated last year
- A fast batching API to serve LLM models☆183Updated last year
- A multimodal, function calling powered LLM webui.☆214Updated 9 months ago
- A pipeline parallel training script for LLMs.☆150Updated last month
- Easily view and modify JSON datasets for large language models☆76Updated last month
- automatically quant GGUF models☆184Updated this week
- Low-Rank adapter extraction for fine-tuned transformers models☆173Updated last year
- Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes…☆147Updated last year
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆238Updated last year
- A daemon that automatically manages the performance states of NVIDIA GPUs.☆89Updated 2 weeks ago
- SLOP Detector and analyzer based on dictionary for shareGPT JSON and text☆70Updated 7 months ago
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆172Updated 2 months ago
- Merge Transformers language models by use of gradient parameters.☆206Updated 10 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆608Updated this week
- GPU Power and Performance Manager☆59Updated 8 months ago
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆990Updated this week
- The RunPod worker template for serving our large language model endpoints. Powered by vLLM.☆327Updated last week
- Open Source Text Embedding Models with OpenAI Compatible API☆154Updated 11 months ago
- Comparison of Language Model Inference Engines☆217Updated 6 months ago
- Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLa…☆633Updated this week
- This is our own implementation of 'Layer Selective Rank Reduction'☆239Updated last year
- ☆49Updated 4 months ago
- Lightweight Inference server for OpenVINO☆187Updated last week