sasha0552 / vllm-ci
CI scripts designed to build a Pascal-compatible version of vLLM.
☆12Updated 7 months ago
Alternatives and similar repositories for vllm-ci:
Users that are interested in vllm-ci are comparing it to the libraries listed below
- A multimodal, function calling powered LLM webui.☆214Updated 6 months ago
- A fork of vLLM enabling Pascal architecture GPUs☆25Updated last month
- automatically quant GGUF models☆164Updated this week
- The main repository for building Pascal-compatible versions of ML applications and libraries.☆64Updated 2 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆131Updated 9 months ago
- Easily view and modify JSON datasets for large language models☆72Updated last month
- Comparison of Language Model Inference Engines☆210Updated 3 months ago
- A fast batching API to serve LLM models☆183Updated 11 months ago
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆148Updated 10 months ago
- A pipeline parallel training script for LLMs.☆136Updated this week
- OpenAI compatible API for TensorRT LLM triton backend☆202Updated 8 months ago
- ☆52Updated this week
- llama.cpp fork with additional SOTA quants and improved performance☆231Updated this week
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆565Updated last week
- An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.☆245Updated 3 weeks ago
- Open source LLM UI, compatible with all local LLM providers.☆173Updated 6 months ago
- An extension for oobabooga/text-generation-webui that enables the LLM to search the web using DuckDuckGo☆232Updated this week
- Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLa…☆406Updated this week
- Stable Diffusion and Flux in pure C/C++☆14Updated this week
- Dataset Crafting w/ RAG/Wikipedia ground truth and Efficient Fine-Tuning Using MLX and Unsloth. Includes configurable dataset annotation …☆178Updated 8 months ago
- Merge Transformers language models by use of gradient parameters.☆205Updated 7 months ago
- The RunPod worker template for serving our large language model endpoints. Powered by vLLM.☆299Updated this week
- ☆46Updated last month
- An unsupervised model merging algorithm for Transformers-based language models.☆104Updated 11 months ago
- Web UI for ExLlamaV2☆486Updated last month
- Low-Rank adapter extraction for fine-tuned transformers models☆171Updated 11 months ago
- This is our own implementation of 'Layer Selective Rank Reduction'☆233Updated 10 months ago
- ☆83Updated 3 months ago
- This is the Mixture-of-Agents (MoA) concept, adapted from the original work by TogetherAI. My version is tailored for local model usage a…☆113Updated 9 months ago
- Lightweight Inference server for OpenVINO☆144Updated this week