ggml-org / llama.vim
Vim plugin for LLM-assisted code/text completion
☆97Updated this week
Related projects ⓘ
Alternatives and complementary repositories for llama.vim
- llama.cpp fork with additional SOTA quants and improved performance☆93Updated this week
- LLaVA server (llama.cpp).☆177Updated last year
- Inference of Mamba models in pure C☆178Updated 8 months ago
- Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.☆45Updated last year
- GGUF implementation in C as a library and a tools CLI program☆244Updated 4 months ago
- Python bindings for ggml☆132Updated 2 months ago
- GRDN.AI app for garden optimization☆69Updated 9 months ago
- inference code for mixtral-8x7b-32kseqlen☆98Updated 11 months ago
- LLM-based code completion engine☆175Updated last year
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆31Updated last year
- Ultra low overhead NVIDIA GPU telemetry plugin for telegraf with memory temperature readings.☆61Updated 4 months ago
- An implementation of bucketMul LLM inference☆214Updated 4 months ago
- GGML implementation of BERT model with Python bindings and quantization.☆51Updated 9 months ago
- An implementation of delta-iris in tinygrad☆71Updated 3 months ago
- run embeddings in MLX☆72Updated last month
- Run GGML models with Kubernetes.☆173Updated 11 months ago
- run paligemma in real time☆123Updated 6 months ago
- example of using CoreML from c++☆22Updated last year
- ☆83Updated 8 months ago
- asynchronous/distributed speculative evaluation for llama3☆37Updated 3 months ago
- LLM training in simple, raw C/CUDA☆17Updated 6 months ago
- Local ML voice chat using high-end models.☆146Updated last week
- WebGPU LLM inference tuned by hand☆147Updated last year
- Inference of Large Multimodal Models in C/C++. LLaVA and others☆46Updated last year
- Port of Suno AI's Bark in C/C++ for fast inference☆54Updated 7 months ago
- Mistral7B playing DOOM☆122Updated 4 months ago
- C API for MLX☆79Updated this week
- Experiments with BitNet inference on CPU☆50Updated 7 months ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆118Updated 10 months ago
- tinygrad port of the RWKV large language model.☆43Updated 5 months ago