The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM models, execute structured function calls and get structured output. Works also with models not fine-tuned to JSON output and function calls.
☆622Mar 9, 2026Updated last week
Alternatives and similar repositories for llama-cpp-agent
Users that are interested in llama-cpp-agent are comparing it to the libraries listed below
Sorting:
- ToolAgents is a lightweight and flexible framework for creating function-calling agents with various language models and APIs.☆29Mar 15, 2026Updated last week
- ☆32Dec 29, 2023Updated 2 years ago
- Locally running LLM with internet access☆97Jun 30, 2025Updated 8 months ago
- function calling-based LLM agents☆291Sep 16, 2024Updated last year
- Python bindings for llama.cpp☆10,058Aug 15, 2025Updated 7 months ago
- Chat language model that can use tools and interpret the results☆1,594Dec 3, 2025Updated 3 months ago
- A Comprehensive survey on business use cases of AI that help them thrive in the digital economy☆13Oct 7, 2020Updated 5 years ago
- TypeScript generator for llama.cpp Grammar directly from TypeScript interfaces☆141Jul 9, 2024Updated last year
- Modified Beam Search with periodical restart☆12Sep 12, 2024Updated last year
- Inference of Mamba, Mamba2 and Mamba3 models in pure C☆199Updated this week
- An experimental desktop client for using Claude Desktop's MCP with Novelcrafter codices.☆10Dec 3, 2024Updated last year
- This GUI aims to simplify the process of converting GGUF files to llamafile format by providing an intuitive and convenient way for users…☆14Jan 2, 2026Updated 2 months ago
- Harness LLMs with Multi-Agent Programming☆3,932Updated this week
- Your Trusty Memory-enabled AI Companion - Simple RAG chatbot optimized for local LLMs | 12 Languages Supported | OpenAI API Compatible☆350Feb 28, 2025Updated last year
- Create Custom LLMs☆1,820Nov 8, 2025Updated 4 months ago
- A guidance compatibility layer for llama-cpp-python☆36Sep 11, 2023Updated 2 years ago
- Python bindings for the Transformer models implemented in C/C++ using GGML library.☆1,883Jan 28, 2024Updated 2 years ago
- A multimodal, function calling powered LLM webui.☆215Sep 23, 2024Updated last year
- Enforce the output format (JSON Schema, Regex etc) of a language model☆1,994Aug 24, 2025Updated 6 months ago
- cli tool to quantize gguf, gptq, awq, hqq and exl2 models☆79Dec 17, 2024Updated last year
- ☆1,215Dec 22, 2025Updated 2 months ago
- Simple agent framework using Ollama tool calling☆10Aug 27, 2024Updated last year
- ☆134Mar 14, 2026Updated last week
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,154Mar 13, 2026Updated last week
- ☆337Mar 5, 2026Updated 2 weeks ago
- An AI assistant beyond the chat box.☆330Mar 11, 2024Updated 2 years ago
- Open-source LLM load balancer and serving platform for self-hosting LLMs at scale 🏓🦙 Alternative to projects like llm-d, Docker Model R…☆1,483Updated this week
- Converts JSON-Schema to GBNF grammar to use with llama.cpp☆55Nov 27, 2023Updated 2 years ago
- A tool for generating function arguments and choosing what function to call with local LLMs☆439Mar 12, 2024Updated 2 years ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆150Jan 7, 2026Updated 2 months ago
- ☆19Jun 5, 2023Updated 2 years ago
- WilmerAI is one of the oldest LLM semantic routers. It uses multi-layer prompt routing and complex workflows to allow you to not only cre…☆806Feb 9, 2026Updated last month
- Python package wrapping llama.cpp for on-device LLM inference☆101Oct 12, 2025Updated 5 months ago
- A simple experiment on letting two local LLM have a conversation about anything!☆112Jul 3, 2024Updated last year
- ☆38Mar 12, 2024Updated 2 years ago
- entropix style sampling + GUI☆27Oct 30, 2024Updated last year
- A library for working with GBNF files☆29Nov 2, 2025Updated 4 months ago
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆839Updated this week
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,468Mar 4, 2026Updated 2 weeks ago