CI scripts designed to build a Pascal-compatible version of vLLM.
☆12Aug 10, 2024Updated last year
Alternatives and similar repositories for vllm-ci
Users that are interested in vllm-ci are comparing it to the libraries listed below
Sorting:
- A library and CLI utilities for managing performance states of NVIDIA GPUs.☆33Oct 6, 2024Updated last year
- Structured TRIZ prompt engineering for LLMs in an open, portable XML format – MIT licensed.☆16Nov 11, 2025Updated 3 months ago
- ☆11Jan 28, 2024Updated 2 years ago
- ☆12Jan 19, 2024Updated 2 years ago
- Modified Beam Search with periodical restart☆12Sep 12, 2024Updated last year
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.☆13Mar 30, 2024Updated last year
- ☆14Aug 25, 2024Updated last year
- Oobabooga "Hello World" API example for node.js with Express☆13Jul 2, 2023Updated 2 years ago
- An OpenAI API compatible images server to generate or manipulate images.☆17Feb 2, 2025Updated last year
- Cross-GPU KV Cache Marketplace☆23Nov 12, 2025Updated 3 months ago
- Let's have some retro gaming fun with AI! Join the discord: https://discord.gg/5xXzkMu8Zk☆68Nov 19, 2025Updated 3 months ago
- LLM backed Fantasy Tribe Game☆19Nov 21, 2024Updated last year
- Friendly Terminal Assistant for Developers☆17Mar 23, 2024Updated last year
- Implementation of stop sequencer for Huggingface Transformers☆16Jun 6, 2023Updated 2 years ago
- Personal voice assistant, with voice interruption and Twilio support☆18Feb 24, 2025Updated last year
- ☆17Dec 16, 2024Updated last year
- A tool that can be used to measure the sequential performance of any OpenAI-compatible LLM API☆23Aug 1, 2024Updated last year
- ☆17Dec 18, 2023Updated 2 years ago
- Multi-agent autonomous research system using LangGraph and LangChain. Generates citation-backed reports with credibility scoring and web …☆145Feb 28, 2026Updated last week
- A guide to testing different runpod (and other linux VMs) configurations. Specifically the speed of LLM outputs☆17Jan 12, 2024Updated 2 years ago
- Benchmarking tool for vLLM inference performance with GPU monitoring☆41Nov 24, 2025Updated 3 months ago
- Building synthetic data for preference tuning☆27Dec 26, 2024Updated last year
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆88Mar 2, 2026Updated last week
- ☆16Jul 13, 2024Updated last year
- A fork of textgen that kept some things like Exllama and old GPTQ.☆22Aug 20, 2024Updated last year
- Submission to the inverse scaling prize☆23Jul 23, 2023Updated 2 years ago
- ☆24Mar 10, 2025Updated 11 months ago
- KoboldCpp Smart Launcher with GPU Layer and Tensor Override Tuning☆30May 18, 2025Updated 9 months ago
- The easiest & fastest way to run customized and fine-tuned LLMs locally or on the edge☆24Mar 5, 2025Updated last year
- 5X faster 60% less memory QLoRA finetuning☆21May 28, 2024Updated last year
- Proteus is an experimental platform that combines the power of Large Language Models with the Genesis physics engine☆26Dec 20, 2024Updated last year
- Demo of an "always-on" AI assistant.☆24Feb 14, 2024Updated 2 years ago
- Stable Diffusion and Flux in pure C/C++☆24Mar 1, 2026Updated last week
- Prometheus exporter for Linux based GDDR6/GDDR6X VRAM and GPU Core Hot spot temperature reader for NVIDIA RTX 3000/4000 series GPUs.☆24Oct 2, 2024Updated last year
- A simple no-install web UI for Ollama and OAI-Compatible APIs!☆31Jan 30, 2025Updated last year
- Discord chatbot interface to train an LLM on user message history☆27Jun 9, 2023Updated 2 years ago
- In this repository, 16 models compete to outperform each other in the game Town of Salem. Each model is randomly assigned roles like Vamp…☆43Jun 14, 2025Updated 8 months ago
- entropix style sampling + GUI☆27Oct 30, 2024Updated last year
- Agent that writes consistent and interesting long stories for any fiction form☆136Nov 12, 2025Updated 3 months ago