okuvshynov / llama_duo
asynchronous/distributed speculative evaluation for llama3
☆38Updated 7 months ago
Alternatives and similar repositories for llama_duo:
Users that are interested in llama_duo are comparing it to the libraries listed below
- Inference of Mamba models in pure C☆186Updated last year
- LLM training in simple, raw C/CUDA☆92Updated 10 months ago
- iterate quickly with llama.cpp hot reloading. use the llama.cpp bindings with bun.sh☆48Updated last year
- The Finite Field Assembly Programming Language☆35Updated 3 weeks ago
- Experiments with BitNet inference on CPU☆53Updated 11 months ago
- High-Performance SGEMM on CUDA devices☆86Updated last month
- GGML implementation of BERT model with Python bindings and quantization.☆54Updated last year
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆44Updated 3 weeks ago
- Learning about CUDA by writing PTX code.☆123Updated last year
- Gpu benchmark☆53Updated last month
- Lightweight Llama 3 8B Inference Engine in CUDA C☆46Updated this week
- A fork of llama3.c used to do some R&D on inferencing☆19Updated 2 months ago
- A minimalistic C++ Jinja templating engine for LLM chat templates☆126Updated last week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆39Updated 10 months ago
- Web browser version of StarCoder.cpp☆44Updated last year
- Explore training for quantized models☆16Updated 2 months ago
- RWKV in nanoGPT style☆187Updated 9 months ago
- A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct mode…☆123Updated 7 months ago
- ☆40Updated last year
- minimal C implementation of speculative decoding based on llama2.c☆19Updated 7 months ago
- Make triton easier☆47Updated 9 months ago
- Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024☆177Updated 10 months ago
- ☆46Updated 7 months ago
- Course Project for COMP4471 on RWKV☆17Updated last year
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆231Updated 2 weeks ago
- Editor with LLM generation tree exploration☆65Updated last month
- ☆200Updated last month
- RWKV-7: Surpassing GPT☆80Updated 3 months ago
- Train your own small bitnet model☆65Updated 4 months ago