google / minja
A minimalistic C++ Jinja templating engine for LLM chat templates
☆43Updated last week
Alternatives and similar repositories for minja:
Users that are interested in minja are comparing it to the libraries listed below
- MLX-Embeddings is the best package for running Vision and Language Embedding models locally on your Mac using MLX.☆85Updated 2 months ago
- Benchmarks comparing PyTorch and MLX on Apple Silicon GPUs☆68Updated 5 months ago
- Implementation of nougat that focuses on processing pdf locally.☆75Updated 8 months ago
- GGML implementation of BERT model with Python bindings and quantization.☆52Updated 10 months ago
- Light WebUI for lm.rs☆22Updated 2 months ago
- 1.58 Bit LLM on Apple Silicon using MLX☆176Updated 8 months ago
- Port of Suno AI's Bark in C/C++ for fast inference☆55Updated 8 months ago
- Distributed Inference for mlx LLm☆77Updated 5 months ago
- AirLLM 70B inference with single 4GB GPU☆12Updated 5 months ago
- A python package for serving LLM on OpenAI-compatible API endpoints with prompt caching using MLX.☆67Updated 3 weeks ago
- Very basic framework for parameterized large language model (Q)LoRA / (Q)Dora fine-tuning using mlx, mlx_lm, and OgbujiPT. Architecture …☆36Updated 2 weeks ago
- LLama implementations benchmarking framework☆12Updated last year
- Lightweight Llama 3 8B Inference Engine in CUDA C☆36Updated this week
- ☆38Updated 9 months ago
- A collection of optimizers for MLX☆25Updated last week
- Testing LLM reasoning abilities with family relationship quizzes.☆54Updated last week
- Video+code lecture on building nanoGPT from scratch☆64Updated 6 months ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆59Updated 2 months ago
- ☆22Updated 3 months ago
- Train your own small bitnet model☆64Updated 2 months ago
- GRDN.AI app for garden optimization☆70Updated 11 months ago
- build your own vector database -- the littlest hnsw☆47Updated this week
- look how they massacred my boy☆63Updated 2 months ago
- Easy to use, High Performant Knowledge Distillation for LLMs☆35Updated this week
- an implementation of Self-Extend, to expand the context window via grouped attention☆118Updated last year
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆23Updated 2 months ago
- instinct.cpp provides ready to use alternatives to OpenAI Assistant API and built-in utilities for developing AI Agent applications (RAG,…☆40Updated 6 months ago
- Ongoing research training transformer models at scale☆34Updated 11 months ago
- Routing on Random Forest (RoRF)☆96Updated 3 months ago