Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on different ports and loading/unloading them on demand
☆90Mar 23, 2026Updated this week
Alternatives and similar repositories for large-model-proxy
Users that are interested in large-model-proxy are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Modified Beam Search with periodical restart☆12Sep 12, 2024Updated last year
- A proxy that hosts multiple single-model runners such as LLama.cpp and vLLM☆13May 30, 2025Updated 9 months ago
- ☆20Sep 28, 2024Updated last year
- ☆17Dec 16, 2024Updated last year
- This GUI aims to simplify the process of converting GGUF files to llamafile format by providing an intuitive and convenient way for users…☆14Jan 2, 2026Updated 2 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆211Jan 5, 2026Updated 2 months ago
- A library and CLI utilities for managing performance states of NVIDIA GPUs.☆34Oct 6, 2024Updated last year
- Synthify: Seamlessly generate ai datasets with a no-code UI | https://synthify.toolstack.run☆48Feb 9, 2025Updated last year
- Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc☆2,868Updated this week
- run ollama & gguf easily with a single command☆52May 15, 2024Updated last year
- Helps agents work more efficiently by translating cline/Roo-Code tool calls into native tool calls in the API☆59Sep 28, 2025Updated 6 months ago
- 🌳 MCTS-inspired parallel beam search for conversation optimization. Explore multiple dialogue strategies simultaneously, stress-test a…☆35Jan 18, 2026Updated 2 months ago
- A frontend for creative writing with LLMs☆157Jul 15, 2024Updated last year
- an auto-sleeping and -waking framework around llama.cpp☆12Feb 8, 2025Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆12May 30, 2025Updated 9 months ago
- Large-Language-Model to Machine Interface project.☆19Dec 5, 2023Updated 2 years ago
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async API☆48Sep 26, 2024Updated last year
- llama-swap + a minimal ollama compatible api☆54Mar 14, 2026Updated 2 weeks ago
- Glyphs, acting as collaboratively defined symbols linking related concepts, add a layer of multidimensional semantic richness to user-AI …☆56Feb 10, 2025Updated last year
- An MCP server implementation providing a standardized interface for LLMs to interact with the Atla API.☆17Jul 21, 2025Updated 8 months ago
- Important ideas☆18Oct 13, 2025Updated 5 months ago
- CI scripts designed to build a Pascal-compatible version of vLLM.☆12Aug 10, 2024Updated last year
- Simple high-throughput inference library☆155May 14, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆24Apr 9, 2024Updated last year
- GPU Power and Performance Manager☆69Oct 13, 2024Updated last year
- The one who calls upon functions - Function-Calling Language Model☆36Oct 2, 2023Updated 2 years ago
- Autonomous, agentic, creative story writing system that incorporates stored embeddings and Knowledge Graphs.☆102Feb 16, 2026Updated last month
- "a towel is about the most massively useful thing an interstellar AI hitchhiker can have"☆48Oct 9, 2024Updated last year
- Efficient visual programming for AI language models☆362May 13, 2025Updated 10 months ago
- KoboldCpp Smart Launcher with GPU Layer and Tensor Override Tuning☆31May 18, 2025Updated 10 months ago
- ☆13Mar 10, 2025Updated last year
- ☆20Aug 12, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆30Oct 4, 2024Updated last year
- A web-app to explore topics using LLM (less typing and more clicks)☆68Mar 15, 2026Updated last week
- Using Pinecone, LangChain + OpenAI for Generative Q&A with Retrieval Augmented Generation (RAG).☆16Aug 9, 2023Updated 2 years ago
- This project is a reverse-engineered version of Figma's tone changer. It uses Groq's Llama-3-8b for high-speed inference and to adjust th…☆90Jul 26, 2024Updated last year
- ☆22Jun 13, 2024Updated last year
- An open-source AI agent that lives in your terminal.☆33Mar 20, 2026Updated last week
- TLS & API keys for your LLM APIs☆20Dec 17, 2025Updated 3 months ago