coldlarry / llama2.cppLinks

Inference Llama 2 in one file of pure C

☆10

Alternatives and similar repositories for llama2.cpp

Users that are interested in llama2.cpp are comparing it to the libraries listed below

Sorting:

pierrel55 / llama_st
Load and run Llama from safetensors files in C
☆12Updated 8 months ago
MaggotHATE / Llama_chat
A chat UI for Llama.cpp
☆15Updated last week
leloykun / llama2.cpp
Inference Llama 2 in one file of pure C++
☆83Updated last year
lukasVierling / FaceRWKV
Course Project for COMP4471 on RWKV
☆17Updated last year
mscheong01 / speculative_decoding.c
minimal C implementation of speculative decoding based on llama2.c
☆24Updated last year
foundation-model-stack / fms-model-optimizer
FMS Model Optimizer is a framework for developing reduced precision neural network models.
☆20Updated this week
casper-hansen / AutoAWQ_kernels
☆75Updated 7 months ago
RobinQu / instinct.cpp
instinct.cpp provides ready to use alternatives to OpenAI Assistant API and built-in utilities for developing AI Agent applications (RAG,…
☆52Updated last year
catid / bitnet_cpu
Experiments with BitNet inference on CPU
☆54Updated last year
gau-nernst / quantized-training
Explore training for quantized models
☆20Updated this week
kyegomez / OpenStrawberry
An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO
☆30Updated this week
gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆99Updated last year
kyutai-labs / jax-flash-attn3
JAX bindings for the flash-attention3 kernels
☆11Updated 11 months ago
RWKV-Vibe / RWKV-LM-V7
RWKV-LM-V7(https://github.com/BlinkDL/RWKV-LM) Under Lightning Framework
☆35Updated last week
MollySophia / rwkv-mobile
Inference RWKV with multiple supported backends.
☆51Updated last week
iangitonga / tinyllama.cpp
A C++ implementation of tinyllama inference on CPU.
☆10Updated last year
xyzhang626 / embeddings.cpp
ggml implementation of embedding models including SentenceTransformer and BGE
☆58Updated last year
xaedes / llama.cpp
Port of Facebook's LLaMA model in C/C++
☆22Updated last year
MollySophia / rwkv-qualcomm
Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK
☆75Updated last week
nomic-ai / kompute
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …
☆51Updated 4 months ago
deepspeedai / DeepSpeed-Kernels
☆74Updated 3 months ago
rbitr / llm.f90
LLM inference in Fortran
☆59Updated last year
99991 / pygguf
GGUF parser in Python
☆28Updated 11 months ago
ROCm / flash-attention
Fast and memory-efficient exact attention
☆177Updated this week
chu-tianxiang / QuIP-for-all
QuIP quantization
☆54Updated last year
graphcore-research / unit-scaling-demo
Unit Scaling demo and experimentation code
☆16Updated last year
AlpinDale / QuIP-for-Llama
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models
☆38Updated last year
RWKV / RWKV-wiki
RWKV centralised docs for the community
☆27Updated last week
bertmaher / llama2.so
Inference Llama 2 with a model compiled to native code by TorchInductor
☆14Updated last year
wejoncy / QLLM
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.
☆174Updated 3 months ago