JohannesGaessler / llama.cpp
Port of Facebook's LLaMA model in C/C++
β11Updated this week
Alternatives and similar repositories for llama.cpp:
Users that are interested in llama.cpp are comparing it to the libraries listed below
- Falcon LLM ggml framework with CPU and GPU supportβ246Updated last year
- C++ implementation for π«StarCoderβ450Updated last year
- A gradio web UI for running Large Language Models like GPT-J 6B, OPT, GALACTICA, LLaMA, and Pygmalion.β308Updated last year
- A multimodal, function calling powered LLM webui.β213Updated 4 months ago
- automatically quant GGUF modelsβ151Updated this week
- A fast batching API to serve LLM modelsβ180Updated 9 months ago
- An autonomous AI agent extension for Oobabooga's web uiβ176Updated last year
- An AI assistant beyond the chat box.β317Updated 10 months ago
- Python bindings for ggmlβ136Updated 4 months ago
- An extension for oobabooga/text-generation-webui that enables the LLM to search the web using DuckDuckGoβ201Updated this week
- An Autonomous LLM Agent that runs on Wizcoder-15Bβ338Updated 3 months ago
- Web UI for ExLlamaV2β470Updated this week
- TheBloke's Dockerfilesβ301Updated 10 months ago
- Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRAβ123Updated last year
- ggml implementation of BERTβ476Updated 11 months ago
- A discord bot that roleplays!β147Updated last year
- Simple. elegant LLM Chat Inferenceβ24Updated 7 months ago
- CLIP inference in plain C/C++ with no extra dependenciesβ475Updated 5 months ago
- fastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backeβ¦β408Updated last year
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM β¦β522Updated last month
- My personal fork of koboldcpp where I hack in experimental samplers.β43Updated 8 months ago
- function calling-based LLM agentsβ283Updated 4 months ago
- Extension for Text Generation Webui based on EdgeGPT, a reverse engineered API of Microsoft's Bing Chat AIβ124Updated last year
- The RunPod worker template for serving our large language model endpoints. Powered by vLLM.β273Updated this week
- Automated prompting and scoring framework to evaluate LLMs using updated human knowledge promptsβ111Updated last year
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language modelβ1,454Updated last week
- Native gui to serveral AI services plus llama.cpp local AIs.β108Updated last year
- A guidance language for controlling large language models.β44Updated last year
- Run inference on replit-3B code instruct model using CPUβ154Updated last year
- Real-time Fallacy Detection using OpenAI whisper and ChatGPT/LLaMA/Mistralβ110Updated last year