foldl / chatllm.cpp

Pure C++ implementation of several models for real-time chatting on your computer (CPU)

☆376

Related projects ⓘ

Alternatives and complementary repositories for chatllm.cpp

kvcache-ai / ktransformers
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
☆732Updated this week
QwenLM / qwen.cpp
C++ implementation of Qwen-LM
☆551Updated 10 months ago
skeskinen / bert.cpp
ggml implementation of BERT
☆464Updated 8 months ago
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆348Updated 2 months ago
microsoft / MInference
[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces in…
☆776Updated this week
chu-tianxiang / vllm-gptq
A high-throughput and memory-efficient inference and serving engine for LLMs
☆129Updated 4 months ago
casper-hansen / AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
☆1,745Updated last month
limcheekin / open-text-embeddings
Open Source Text Embedding Models with OpenAI Compatible API
☆131Updated 3 months ago
ikawrakow / ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
☆89Updated this week
mobiusml / hqq
Official implementation of Half-Quadratic Quantization (HQQ)
☆697Updated last week
abetlen / ggml-python
Python bindings for ggml
☆132Updated 2 months ago
gpustack / gpustack
Manage GPU clusters for running LLMs
☆551Updated this week
microsoft / T-MAC
Low-bit LLM inference on CPU with lookup table
☆563Updated 2 weeks ago
microsoft / VPTQ
VPTQ, A Flexible and Extreme low-bit quantization algorithm
☆498Updated this week
lapp0 / lm-inference-engines
Comparison of Language Model Inference Engines
☆190Updated 2 months ago
leafspark / AutoGGUF
automatically quant GGUF models
☆137Updated this week
chenyangMl / llama2.c-zh
支持中文场景的的小语言模型 llama2.c-zh
☆143Updated 8 months ago
cckuailong / SuperAdapters
Finetune ALL LLMs with ALL Adapeters on ALL Platforms!
☆306Updated last month
OpenGVLab / EfficientQAT
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
☆222Updated last month
ModelCloud / GPTQModel
Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
☆118Updated this week
monatis / clip.cpp
CLIP inference in plain C/C++ with no extra dependencies
☆457Updated 2 months ago
modelscope / dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …
☆135Updated 2 months ago
01-ai / Yi-1.5
Yi-1.5 is an upgraded version of Yi, delivering stronger performance in coding, math, reasoning, and instruction-following capability.
☆514Updated 4 months ago
multimodal-art-projection / MAP-NEO
☆873Updated 4 months ago
efeslab / Nanoflow
A throughput-oriented high-performance serving framework for LLMs
☆629Updated last month
intel / xFasterTransformer
☆376Updated this week
hpcaitech / SwiftInfer
Efficient AI Inference & Serving
☆456Updated 10 months ago
xyzhang626 / embeddings.cpp
ggml implementation of embedding models including SentenceTransformer and BGE
☆52Updated 10 months ago
inferflow / inferflow
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
☆236Updated 7 months ago
kroggen / mamba.c
Inference of Mamba models in pure C
☆177Updated 8 months ago