google / minja
A minimalistic C++ Jinja templating engine for LLM chat templates
☆135Updated last week
Alternatives and similar repositories for minja:
Users that are interested in minja are comparing it to the libraries listed below
- Inference of Mamba models in pure C☆188Updated last year
- High-Performance SGEMM on CUDA devices☆90Updated 3 months ago
- asynchronous/distributed speculative evaluation for llama3☆39Updated 8 months ago
- LLM training in simple, raw C/CUDA☆94Updated last year
- GGUF implementation in C as a library and a tools CLI program☆269Updated 3 months ago
- ☆209Updated 3 months ago
- GGML implementation of BERT model with Python bindings and quantization.☆56Updated last year
- Python bindings for ggml☆140Updated 8 months ago
- Inference Llama/Llama2/Llama3 Modes in NumPy☆20Updated last year
- First token cutoff sampling inference example☆30Updated last year
- Experiments with BitNet inference on CPU☆53Updated last year
- Learning about CUDA by writing PTX code.☆128Updated last year
- Lightweight Llama 3 8B Inference Engine in CUDA C☆47Updated last month
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated last month
- Benchmark your GPU with ease☆15Updated this week
- Super-fast Structured Outputs☆213Updated this week
- Load compute kernels from the Hub☆115Updated 2 weeks ago
- 1.58 Bit LLM on Apple Silicon using MLX☆204Updated 11 months ago
- LLM-based code completion engine☆185Updated 3 months ago
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆290Updated 3 months ago
- Benchmarks comparing PyTorch and MLX on Apple Silicon GPUs☆79Updated 9 months ago
- A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct mode…☆125Updated 9 months ago
- Fast parallel LLM inference for MLX☆186Updated 10 months ago
- AI Tensor Engine for ROCm☆187Updated this week
- Transformer GPU VRAM estimator☆59Updated last year
- Implementation of the paper "Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search" by Severo et al.☆78Updated 3 months ago
- ☆306Updated 2 weeks ago
- An implementation of bucketMul LLM inference☆217Updated 10 months ago
- Inference Llama 2 in C++☆44Updated last year
- Thin wrapper around GGML to make life easier☆27Updated this week