sasha0552 / vllm-ciLinks
CI scripts designed to build a Pascal-compatible version of vLLM.
☆12Updated last year
Alternatives and similar repositories for vllm-ci
Users that are interested in vllm-ci are comparing it to the libraries listed below
Sorting:
- A fork of vLLM enabling Pascal architecture GPUs☆31Updated 11 months ago
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆165Updated last year
- Web UI for ExLlamaV2☆514Updated 11 months ago
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆617Updated last week
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,115Updated this week
- The main repository for building Pascal-compatible versions of ML applications and libraries.☆162Updated 4 months ago
- GPU Power and Performance Manager☆65Updated last year
- A multimodal, function calling powered LLM webui.☆217Updated last year
- An extension for oobabooga/text-generation-webui that enables the LLM to search the web☆275Updated last week
- A fast batching API to serve LLM models☆188Updated last year
- automatically quant GGUF models☆220Updated 3 weeks ago
- A pipeline parallel training script for LLMs.☆165Updated 8 months ago
- An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.☆267Updated 10 months ago
- ☆88Updated last month
- SLOP Detector and analyzer based on dictionary for shareGPT JSON and text☆80Updated last month
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆274Updated last week
- Your Trusty Memory-enabled AI Companion - Simple RAG chatbot optimized for local LLMs | 12 Languages Supported | OpenAI API Compatible☆345Updated 10 months ago
- Stable Diffusion and Flux in pure C/C++☆24Updated this week
- Make abliterated models with transformers, easy and fast☆112Updated last month
- LLM Frontend in a single html file☆682Updated 3 weeks ago
- Large-scale LLM inference engine☆1,622Updated last week
- Croco.Cpp is fork of KoboldCPP infering GGML/GGUF models on CPU/Cuda with KoboldAI's UI. It's powered partly by IK_LLama.cpp, and compati…☆153Updated this week
- A daemon that automatically manages the performance states of NVIDIA GPUs.☆109Updated 2 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆21Updated last week
- ☆51Updated 11 months ago
- ☆210Updated 2 weeks ago
- LM inference server implementation based on *.cpp.☆294Updated last month
- Easily view and modify JSON datasets for large language models☆86Updated 8 months ago
- 8-bit CUDA functions for PyTorch Rocm compatible☆41Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆132Updated last year