okuvshynov / llama_duo
asynchronous/distributed speculative evaluation for llama3
☆37Updated 6 months ago
Alternatives and similar repositories for llama_duo:
Users that are interested in llama_duo are comparing it to the libraries listed below
- Inference of Mamba models in pure C☆183Updated 11 months ago
- GGML implementation of BERT model with Python bindings and quantization.☆53Updated last year
- A minimalistic C++ Jinja templating engine for LLM chat templates☆120Updated this week
- Lightweight Llama 3 8B Inference Engine in CUDA C☆45Updated last week
- A fork of llama3.c used to do some R&D on inferencing☆18Updated 2 months ago
- Experiments with BitNet inference on CPU☆53Updated 10 months ago
- Python bindings for ggml☆137Updated 5 months ago
- The Finite Field Assembly Programming Language☆34Updated last week
- High-Performance SGEMM on CUDA devices☆76Updated last month
- Explore training for quantized models☆15Updated last month
- Course Project for COMP4471 on RWKV☆17Updated last year
- LLM training in simple, raw C/CUDA☆91Updated 9 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆38Updated 9 months ago
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆45Updated 4 months ago
- tinygrad port of the RWKV large language model.☆44Updated 8 months ago
- Gpu benchmark☆52Updated 3 weeks ago
- llama.cpp fork with additional SOTA quants and improved performance☆155Updated this week
- iterate quickly with llama.cpp hot reloading. use the llama.cpp bindings with bun.sh☆47Updated last year
- ☆44Updated 7 months ago
- extensible collectives library in triton☆83Updated 4 months ago
- Train your own small bitnet model☆64Updated 4 months ago
- ☆67Updated 3 months ago
- Tensor library for Zig☆11Updated 3 months ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆43Updated last week
- RWKV-7: Surpassing GPT☆79Updated 3 months ago
- GGML implementation of BERT model with Python bindings and quantization.☆26Updated last year
- ☆53Updated 6 months ago
- A relatively basic implementation of RWKV in Rust written by someone with very little math and ML knowledge. Supports 32, 8 and 4 bit eva…☆93Updated last year
- minimal C implementation of speculative decoding based on llama2.c☆18Updated 7 months ago