okuvshynov / llama_duoLinks
asynchronous/distributed speculative evaluation for llama3
☆39Updated 9 months ago
Alternatives and similar repositories for llama_duo
Users that are interested in llama_duo are comparing it to the libraries listed below
Sorting:
- Inference of Mamba models in pure C☆187Updated last year
- A minimalistic C++ Jinja templating engine for LLM chat templates☆153Updated 3 weeks ago
- Samples of good AI generated CUDA kernels☆65Updated last week
- iterate quickly with llama.cpp hot reloading. use the llama.cpp bindings with bun.sh☆47Updated last year
- LLM training in simple, raw C/CUDA☆99Updated last year
- Experiments with BitNet inference on CPU☆55Updated last year
- GGML implementation of BERT model with Python bindings and quantization.☆55Updated last year
- A fork of llama3.c used to do some R&D on inferencing☆22Updated 5 months ago
- High-Performance SGEMM on CUDA devices☆94Updated 4 months ago
- Explore training for quantized models☆18Updated last week
- The Finite Field Assembly Programming Language☆37Updated 2 weeks ago
- new optimizer☆20Updated 10 months ago
- Python bindings for ggml☆141Updated 9 months ago
- A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct mode…☆127Updated 10 months ago
- Lightweight Llama 3 8B Inference Engine in CUDA C☆47Updated 2 months ago
- Course Project for COMP4471 on RWKV☆17Updated last year
- First token cutoff sampling inference example☆30Updated last year
- An LLM inference engine, written in C++☆15Updated 4 months ago
- ☆30Updated last week
- Simple high-throughput inference library☆115Updated 3 weeks ago
- tiny code to access tenstorrent blackhole☆48Updated last week
- Thin wrapper around GGML to make life easier☆34Updated this week
- Web browser version of StarCoder.cpp☆45Updated last year
- Inference Llama/Llama2/Llama3 Modes in NumPy☆21Updated last year
- Testing LLM reasoning abilities with family relationship quizzes.☆61Updated 4 months ago
- LLM training in simple, raw C/Metal Shading Language☆54Updated last year
- Make triton easier☆47Updated 11 months ago
- Learning about CUDA by writing PTX code.☆131Updated last year
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆30Updated last year
- Port of Suno AI's Bark in C/C++ for fast inference☆52Updated last year