cduk / vllm-pascal
A fork of vLLM enabling Pascal architecture GPUs
☆20Updated last week
Related projects ⓘ
Alternatives and complementary repositories for vllm-pascal
- Text-to-speech API endpoint compatible with OpenAI's TTS API endpoint, using Microsoft Edge TTS to generate speech for free locally☆131Updated this week
- A fast batching API to serve LLM models☆172Updated 6 months ago
- idea: https://github.com/nyxkrage/ebook-groupchat/☆82Updated 3 months ago
- automatically quant GGUF models☆140Updated this week
- This is the Mixture-of-Agents (MoA) concept, adapted from the original work by TogetherAI. My version is tailored for local model usage a…☆106Updated 4 months ago
- This extension enhances the capabilities of textgen-webui by integrating advanced vision models, allowing users to have contextualized co…☆47Updated last month
- ☆40Updated 2 months ago
- A multimodal, function calling powered LLM webui.☆208Updated last month
- Memoir+ a persona extension for Text Gen Web UI. That includes memory, emotions, command handling and more.☆171Updated last month
- ☆25Updated last month
- Realtime tts reading of large textfiles by your favourite voice. +Translation via LLM (Python script)☆47Updated last month
- An OpenAI API compatible API for chat with image input and questions about the images. aka Multimodal.☆203Updated last month
- Simple UI for Llama-3.2-11B-Vision & Molmo-7B-D☆114Updated last month
- An extension that lets the AI take the wheel, allowing it to use the mouse and keyboard, recognize UI elements, and prompt itself :3...no…☆96Updated last month
- SLOP Detector and analyzer based on dictionary for shareGPT JSON and text☆45Updated 2 weeks ago
- Gradio based tool to run opensource LLM models directly from Huggingface☆87Updated 4 months ago
- Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU a…☆40Updated last month
- A python application that routes incoming prompts to an LLM by category, and can support a single incoming connection from a front end to…☆171Updated this week
- Ollama chat client in Vue, everything you need to do your private text rpg in browser☆99Updated 3 weeks ago
- Experimental LLM Inference UX to aid in creative writing☆106Updated 4 months ago
- The RunPod worker template for serving our large language model endpoints. Powered by vLLM.☆245Updated this week
- CHAracter State Management - a generative text adventure☆31Updated last month
- ☆112Updated this week
- A local and uncensored AI entity.☆50Updated last month
- Large Model Proxy is designed to make it easy to run multiple resource-heavy Large Models (LM) on the same machine with limited amount of…☆47Updated last month
- StockLlama is a time series forecasting model based on Llama, enhanced with custom embeddings for improved accuracy.☆27Updated 2 months ago
- Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.☆25Updated this week
- A library and CLI utilities for managing performance states of NVIDIA GPUs.☆19Updated last month
- Dataset Crafting w/ RAG/Wikipedia ground truth and Efficient Fine-Tuning Using MLX and Unsloth. Includes configurable dataset annotation …☆162Updated 4 months ago
- ☆128Updated this week