ggml-org / llama.vim

Vim plugin for LLM-assisted code/text completion

☆97

Related projects ⓘ

Alternatives and complementary repositories for llama.vim

ikawrakow / ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
☆93Updated this week
trzy / llava-cpp-server
LLaVA server (llama.cpp).
☆177Updated last year
kroggen / mamba.c
Inference of Mamba models in pure C
☆178Updated 8 months ago
herrera-luis / vision-core-ai
Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.
☆45Updated last year
antirez / gguf-tools
GGUF implementation in C as a library and a tools CLI program
☆244Updated 4 months ago
abetlen / ggml-python
Python bindings for ggml
☆132Updated 2 months ago
4dh / GRDN
GRDN.AI app for garden optimization
☆69Updated 9 months ago
vikhyat / mixtral-inference
inference code for mixtral-8x7b-32kseqlen
☆98Updated 11 months ago
ggml-org / p1
LLM-based code completion engine
☆175Updated last year
ggerganov / vit.cpp
Inference Vision Transformer (ViT) in plain C/C++ with ggml
☆31Updated last year
Figura-Labs-Inc / telegraf_nv_export
Ultra low overhead NVIDIA GPU telemetry plugin for telegraf with memory temperature readings.
☆61Updated 4 months ago
kolinko / effort
An implementation of bucketMul LLM inference
☆214Updated 4 months ago
iamlemec / bert.cpp
GGML implementation of BERT model with Python bindings and quantization.
☆51Updated 9 months ago
geohot / tinydreamer
An implementation of delta-iris in tinygrad
☆71Updated 3 months ago
taylorai / mlx_embedding_models
run embeddings in MLX
☆72Updated last month
danielgross / ggml-k8s
Run GGML models with Kubernetes.
☆173Updated 11 months ago
sumo43 / loopvlm
run paligemma in real time
☆123Updated 6 months ago
wangchou / callCoreMLFromCppOrPython
example of using CoreML from c++
☆22Updated last year
vdesai2014 / inference-optimization-blog-post
☆83Updated 8 months ago
okuvshynov / llama_duo
asynchronous/distributed speculative evaluation for llama3
☆37Updated 3 months ago
Chillee / llm.c
LLM training in simple, raw C/CUDA
☆17Updated 6 months ago
ptsochantaris / emeltal
Local ML voice chat using high-end models.
☆146Updated last week
kayvr / token-hawk
WebGPU LLM inference tuned by hand
☆147Updated last year
monatis / lmm.cpp
Inference of Large Multimodal Models in C/C++. LLaVA and others
☆46Updated last year
ggerganov / bark.cpp
Port of Suno AI's Bark in C/C++ for fast inference
☆54Updated 7 months ago
umuthopeyildirim / DOOM-Mistral
Mistral7B playing DOOM
☆122Updated 4 months ago
ml-explore / mlx-c
C API for MLX
☆79Updated this week
catid / bitnet_cpu
Experiments with BitNet inference on CPU
☆50Updated 7 months ago
sdan / selfextend
an implementation of Self-Extend, to expand the context window via grouped attention
☆118Updated 10 months ago
wozeparrot / tinyrwkv
tinygrad port of the RWKV large language model.
☆43Updated 5 months ago