Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on different ports and loading/unloading them on demand
☆90Mar 16, 2026Updated this week
Alternatives and similar repositories for large-model-proxy
Users that are interested in large-model-proxy are comparing it to the libraries listed below
Sorting:
- Modified Beam Search with periodical restart☆12Sep 12, 2024Updated last year
- A proxy that hosts multiple single-model runners such as LLama.cpp and vLLM☆13May 30, 2025Updated 9 months ago
- ☆20Sep 28, 2024Updated last year
- ☆17Dec 16, 2024Updated last year
- This GUI aims to simplify the process of converting GGUF files to llamafile format by providing an intuitive and convenient way for users…☆14Jan 2, 2026Updated 2 months ago
- ☆210Jan 5, 2026Updated 2 months ago
- A library and CLI utilities for managing performance states of NVIDIA GPUs.☆34Oct 6, 2024Updated last year
- Synthify: Seamlessly generate ai datasets with a no-code UI | https://synthify.toolstack.run☆48Feb 9, 2025Updated last year
- Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc☆2,807Updated this week
- run ollama & gguf easily with a single command☆52May 15, 2024Updated last year
- Helps agents work more efficiently by translating cline/Roo-Code tool calls into native tool calls in the API☆59Sep 28, 2025Updated 5 months ago
- 🌳 MCTS-inspired parallel beam search for conversation optimization. Explore multiple dialogue strategies simultaneously, stress-test a…☆35Jan 18, 2026Updated 2 months ago
- A frontend for creative writing with LLMs☆156Jul 15, 2024Updated last year
- an auto-sleeping and -waking framework around llama.cpp☆12Feb 8, 2025Updated last year
- ☆12May 30, 2025Updated 9 months ago
- Large-Language-Model to Machine Interface project.☆19Dec 5, 2023Updated 2 years ago
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async API☆48Sep 26, 2024Updated last year
- llama-swap + a minimal ollama compatible api☆52Updated this week
- Glyphs, acting as collaboratively defined symbols linking related concepts, add a layer of multidimensional semantic richness to user-AI …☆56Feb 10, 2025Updated last year
- An MCP server implementation providing a standardized interface for LLMs to interact with the Atla API.☆17Jul 21, 2025Updated 7 months ago
- CI scripts designed to build a Pascal-compatible version of vLLM.☆12Aug 10, 2024Updated last year
- GPU Power and Performance Manager☆69Oct 13, 2024Updated last year
- The one who calls upon functions - Function-Calling Language Model☆36Oct 2, 2023Updated 2 years ago
- Autonomous, agentic, creative story writing system that incorporates stored embeddings and Knowledge Graphs.☆98Feb 16, 2026Updated last month
- "a towel is about the most massively useful thing an interstellar AI hitchhiker can have"☆48Oct 9, 2024Updated last year
- Efficient visual programming for AI language models☆362May 13, 2025Updated 10 months ago
- These agents work based on any local model. You ask your question and simply indicate the number of agents and experts who will answer it…☆19Feb 25, 2024Updated 2 years ago
- KoboldCpp Smart Launcher with GPU Layer and Tensor Override Tuning☆30May 18, 2025Updated 10 months ago
- ☆13Mar 10, 2025Updated last year
- ☆20Aug 12, 2024Updated last year
- ☆30Oct 4, 2024Updated last year
- A web-app to explore topics using LLM (less typing and more clicks)☆68Updated this week
- Using Pinecone, LangChain + OpenAI for Generative Q&A with Retrieval Augmented Generation (RAG).☆16Aug 9, 2023Updated 2 years ago
- This project is a reverse-engineered version of Figma's tone changer. It uses Groq's Llama-3-8b for high-speed inference and to adjust th…☆90Jul 26, 2024Updated last year
- An open-source AI agent that lives in your terminal.☆30Updated this week
- TLS & API keys for your LLM APIs☆20Dec 17, 2025Updated 3 months ago
- A simple speech-to-text and text-to-speech AI chatbot that can be run fully offline.☆45Jan 28, 2024Updated 2 years ago
- A simple no-install web UI for Ollama and OAI-Compatible APIs!☆31Jan 30, 2025Updated last year
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆41Aug 4, 2023Updated 2 years ago