fairydreaming / llama.cppLinks
LLM inference in C/C++
☆21Updated 10 months ago
Alternatives and similar repositories for llama.cpp
Users that are interested in llama.cpp are comparing it to the libraries listed below
Sorting:
- ☆109Updated 5 months ago
- cli tool to quantize gguf, gptq, awq, hqq and exl2 models☆78Updated last year
- automatically quant GGUF models☆219Updated last month
- Easily view and modify JSON datasets for large language models☆87Updated 8 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆49Updated 3 months ago
- LLM inference in C/C++☆104Updated last week
- A pipeline parallel training script for LLMs.☆166Updated 9 months ago
- ☆135Updated last month
- ☆51Updated last year
- SLOP Detector and analyzer based on dictionary for shareGPT JSON and text☆81Updated this week
- A fast batching API to serve LLM models☆189Updated last year
- Distributed Inference for mlx LLm☆100Updated last year
- ☆51Updated 11 months ago
- private-machine is an AI companion system with emotion, needs and goals simulation. Very silly, not based on real science.☆28Updated 2 months ago
- 1.58-bit LLaMa model☆82Updated last year
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆88Updated this week
- LLM based agents with proactive interactions, long-term memory, external tool integration, and local deployment capabilities.☆107Updated 6 months ago
- Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.☆29Updated 10 months ago
- This is the Mixture-of-Agents (MoA) concept, adapted from the original work by TogetherAI. My version is tailored for local model usage a…☆118Updated last year
- ☆22Updated last year
- Super simple python connectors for llama.cpp, including vision models (Gemma 3, Qwen2-VL). Compile llama.cpp and run!☆29Updated last month
- Low-Rank adapter extraction for fine-tuned transformers models☆180Updated last year
- Dagger functions to import Hugging Face GGUF models into a local ollama instance and optionally push them to ollama.com.☆119Updated last year
- Kosmos-2.5 is a cutting-edge Multimodal-LLM (MLLM) specializing in image OCR. However, its stringent software requirements & Python-scrip…☆67Updated last year
- ☆166Updated 5 months ago
- Testing LLM reasoning abilities with family relationship quizzes.☆63Updated last year
- Serving LLMs in the HF-Transformers format via a PyFlask API☆72Updated last year
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆100Updated 7 months ago
- REAP: Router-weighted Expert Activation Pruning for SMoE compression☆222Updated last month
- run ollama & gguf easily with a single command☆52Updated last year