okuvshynov / llama_duoLinks

asynchronous/distributed speculative evaluation for llama3

☆38

Alternatives and similar repositories for llama_duo

Users that are interested in llama_duo are comparing it to the libraries listed below

Sorting:

kroggen / mamba.c
Inference of Mamba models in pure C
☆192Updated last year
google / minja
A minimalistic C++ Jinja templating engine for LLM chat templates
☆193Updated last month
abetlen / ggml-python
Python bindings for ggml
☆146Updated last year
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆107Updated 9 months ago
iamlemec / bert.cpp
GGML implementation of BERT model with Python bindings and quantization.
☆55Updated last year
spirobel / bunny-llama
iterate quickly with llama.cpp hot reloading. use the llama.cpp bindings with bun.sh
☆50Updated 2 years ago
catid / bitnet_cpu
Experiments with BitNet inference on CPU
☆54Updated last year
LeetArxiv / Finite-Field-Assembly
The Finite Field Assembly Programming Language
☆36Updated 5 months ago
gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆107Updated last year
jameswdelancey / llama3.c
A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct mode…
☆139Updated last week
nomic-ai / kompute
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …
☆52Updated 8 months ago
ngxson / ggml-easy
Thin wrapper around GGML to make life easier
☆40Updated 4 months ago
KevlarKanou / rwkv7.c
Inference RWKV v7 in pure C.
☆40Updated 3 weeks ago
wozeparrot / tinyrwkv
tinygrad port of the RWKV large language model.
☆44Updated 7 months ago
LaurieWired / BenchmarkCustomPTX
Custom PTX Instruction Benchmark
☆131Updated 8 months ago
gf712 / gpt2-cpp
GPT2 implementation in C++ using Ort
☆26Updated 4 years ago
antirez / gguf-tools
GGUF implementation in C as a library and a tools CLI program
☆292Updated 2 months ago
GreenBitAI / low_bit_llama
Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs
☆110Updated last year
staghado / vit.cpp
Inference Vision Transformer (ViT) in plain C/C++ with ggml
☆295Updated last year
abhisheknair10 / llama3.cu
Lightweight Llama 3 8B Inference Engine in CUDA C
☆48Updated 7 months ago
ggml-org / p1
LLM-based code completion engine
☆190Updated 9 months ago
ml-explore / mlx-c
C API for MLX
☆144Updated last month
mscheong01 / speculative_decoding.c
minimal C implementation of speculative decoding based on llama2.c
☆25Updated last year
fabiocannizzo / FastBinarySearch
Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers
☆152Updated 10 months ago
BlinkDL / nanoRWKV
RWKV in nanoGPT style
☆193Updated last year
chu-tianxiang / llama-cpp-torch
llama.cpp to PyTorch Converter
☆34Updated last year
ggerganov / stable-diffusion.cpp
Stable Diffusion in pure C/C++
☆60Updated 2 years ago
lukasVierling / FaceRWKV
Course Project for COMP4471 on RWKV
☆17Updated last year
tinygrad / gpuctypes
ctypes wrappers for HIP, CUDA, and OpenCL
☆130Updated last year
99991 / pygguf
GGUF parser in Python
☆28Updated last year