Optimizing inference proxy for LLMs
☆3,381Jan 28, 2026Updated last month
Alternatives and similar repositories for optillm
Users that are interested in optillm are comparing it to the libraries listed below
Sorting:
- Entropy Based Sampling and Parallel CoT Decoding☆3,432Nov 13, 2024Updated last year
- Structured Outputs☆13,564Mar 9, 2026Updated last week
- DSPy: The framework for programming—not prompting—language models☆32,853Updated this week
- Tools for merging pretrained large language models.☆6,867Updated this week
- Go ahead and axolotl questions☆11,460Updated this week
- ☆1,033Dec 17, 2024Updated last year
- ☆968Jan 23, 2025Updated last year
- CaSIL is an advanced natural language processing system that implements a sophisticated four-layer semantic analysis architecture. It pro…☆67Nov 5, 2024Updated last year
- Plano is an AI-native proxy and data plane for agentic apps — with built-in orchestration, safety, observability, and smart LLM routing s…☆5,971Updated this week
- Large-scale LLM inference engine☆1,677Mar 12, 2026Updated last week
- 💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows☆12,291Updated this week
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,460Mar 4, 2026Updated 2 weeks ago
- Create Custom LLMs☆1,820Nov 8, 2025Updated 4 months ago
- Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing a…☆39,597Updated this week
- One command brings a complete pre-wired LLM stack with hundreds of services to explore.☆2,508Updated this week
- g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains☆4,208Dec 30, 2025Updated 2 months ago
- SGLang is a high-performance serving framework for large language models and multimodal models.☆24,455Updated this week
- structured outputs for llms☆12,551Updated this week
- Agentless🐱: an agentless approach to automatically solve software development problems☆2,019Dec 22, 2024Updated last year
- A library for advanced large language model reasoning☆2,338Jun 10, 2025Updated 9 months ago
- Letta is the platform for building stateful agents: AI with advanced memory that can learn and self-improve over time.☆21,579Mar 13, 2026Updated last week
- Harness LLMs with Multi-Agent Programming☆3,932Updated this week
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,882May 17, 2025Updated 10 months ago
- Open-source implementation of AlphaEvolve☆5,676Updated this week
- Efficient visual programming for AI language models☆362May 13, 2025Updated 10 months ago
- A language model programming library.☆5,881Jun 5, 2025Updated 9 months ago
- WilmerAI is one of the oldest LLM semantic routers. It uses multi-layer prompt routing and complex workflows to allow you to not only cre …☆806Feb 9, 2026Updated last month
- SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.☆7,728Nov 7, 2025Updated 4 months ago
- aider is AI pair programming in your terminal☆41,939Mar 9, 2026Updated last week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆3,121Mar 9, 2026Updated last week
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,154Updated this week
- Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.☆54,096Updated this week
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆4,698Aug 10, 2024Updated last year
- Agentic AI framework for enterprise workflow automation.☆1,546Apr 18, 2025Updated 11 months ago
- SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersec…☆18,730Mar 9, 2026Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆73,479Updated this week
- Fast, flexible LLM inference☆6,713Updated this week
- A toolkit to create optimal Production-readyRetrieval Augmented Generation(RAG) setup for your data☆1,530May 20, 2025Updated 10 months ago
- ☆337Mar 5, 2026Updated 2 weeks ago