google / minjaLinks

A minimalistic C++ Jinja templating engine for LLM chat templates

☆197

Alternatives and similar repositories for minja

Users that are interested in minja are comparing it to the libraries listed below

Sorting:

kroggen / mamba.c
Inference of Mamba models in pure C
☆192Updated last year
antirez / gguf-tools
GGUF implementation in C as a library and a tools CLI program
☆295Updated 2 months ago
iamlemec / bert.cpp
GGML implementation of BERT model with Python bindings and quantization.
☆56Updated last year
abetlen / ggml-python
Python bindings for ggml
☆146Updated last year
ml-explore / mlx-c
C API for MLX
☆150Updated last month
nomic-ai / kompute
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …
☆52Updated 9 months ago
ngxson / ggml-easy
Thin wrapper around GGML to make life easier
☆40Updated 2 weeks ago
facebookresearch / fastgen
Simple high-throughput inference library
☆149Updated 6 months ago
exo-explore / mlx-bitnet
1.58 Bit LLM on Apple Silicon using MLX
☆225Updated last year
okuvshynov / llama_duo
asynchronous/distributed speculative evaluation for llama3
☆38Updated last year
gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆108Updated last year
apple / ml-recurrent-drafter
☆218Updated 9 months ago
huggingface / xet-core
xet client tech, used in huggingface_hub
☆322Updated this week
jameswdelancey / llama3.c
A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct mode…
☆140Updated last month
fabiocannizzo / FastBinarySearch
Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers
☆152Updated 11 months ago
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆110Updated 10 months ago
hscspring / llama.np
Inference Llama/Llama2/Llama3 Modes in NumPy
☆21Updated last year
abhisheknair10 / llama3.cu
Lightweight Llama 3 8B Inference Engine in CUDA C
☆52Updated 7 months ago
HazyResearch / HipKittens
Fast and Furious AMD Kernels
☆278Updated this week
staghado / vit.cpp
Inference Vision Transformer (ViT) in plain C/C++ with ggml
☆298Updated last year
ggml-org / p1
LLM-based code completion engine
☆190Updated 9 months ago
BlinkDL / fast.c
Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.
☆73Updated 9 months ago
andrewkchan / deepseek.cpp
CPU inference for the DeepSeek family of large language models in C++
☆313Updated last month
LucasSte / MLX-vs-Pytorch
Benchmarks comparing PyTorch and MLX on Apple Silicon GPUs
☆89Updated last year
geohot / tt-tiny
tiny code to access tenstorrent blackhole
☆61Updated 5 months ago
vllm-project / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆210Updated this week
sgl-project / sgl-project.github.io
This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang/tree/main/docs.
☆89Updated this week
ml-explore / mlx-onnx
MLX support for the Open Neural Network Exchange (ONNX)
☆62Updated last year
catid / bitnet_cpu
Experiments with BitNet inference on CPU
☆54Updated last year
unixpickle / learn-ptx
Learning about CUDA by writing PTX code.
☆147Updated last year