foldl / chatllm.cppLinks
Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)
β785Updated this week
Alternatives and similar repositories for chatllm.cpp
Users that are interested in chatllm.cpp are comparing it to the libraries listed below
Sorting:
- An innovative library for efficient LLM inference via low-bit quantizationβ352Updated last year
- π―An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality degradation across Weight-Only Quantizaβ¦β839Updated this week
- llama.cpp fork with additional SOTA quants and improved performanceβ1,587Updated this week
- LM inference server implementation based on *.cpp.β295Updated 2 months ago
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM β¦β615Updated 11 months ago
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language modelβ1,563Updated 10 months ago
- Low-bit LLM inference on CPU/NPU with lookup tableβ916Updated 8 months ago
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.β238Updated last month
- Official implementation of Half-Quadratic Quantization (HQQ)β912Updated last month
- LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU viβ¦β1,007Updated this week
- Comparison of Language Model Inference Enginesβ239Updated last year
- CLIP inference in plain C/C++ with no extra dependenciesβ549Updated 7 months ago
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUsβ626Updated last week
- automatically quant GGUF modelsβ219Updated last month
- CPU inference for the DeepSeek family of large language models in C++β317Updated 4 months ago
- ggml implementation of BERTβ498Updated last year
- Large-scale LLM inference engineβ1,641Updated 2 weeks ago
- Suno AI's Bark model in C/C++ for fast text-to-speech generationβ854Updated last year
- VPTQ, A Flexible and Extreme low-bit quantization algorithmβ674Updated 9 months ago
- C++ implementation of Qwen-LMβ616Updated last year
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.β156Updated 7 months ago
- Inference Vision Transformer (ViT) in plain C/C++ with ggmlβ306Updated last year
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inferenceβ987Updated last month
- A high-performance inference system for large language models, designed for production environments.β491Updated last month
- run DeepSeek-R1 GGUFs on KTransformersβ260Updated 11 months ago
- Generative AI extensions for onnxruntimeβ953Updated this week
- llama3.cuda is a pure C/CUDA implementation for Llama 3 model.β350Updated 9 months ago
- Python bindings for ggmlβ147Updated last year
- Universal cross-platform tokenizers binding to HF and sentencepieceβ451Updated 2 weeks ago
- TinyChatEngine: On-Device LLM Inference Libraryβ941Updated last year