leloykun / llama2.cpp

Inference Llama 2 in one file of pure C++

☆79

Related projects ⓘ

Alternatives and complementary repositories for llama2.cpp

gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆86Updated 6 months ago
mani-kantap / llm-inference-solutions
A collection of all available inference solutions for the LLMs
☆72Updated last month
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆89Updated this week
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated 3 weeks ago
gf712 / gpt2-cpp
GPT2 implementation in C++ using Ort
☆24Updated 3 years ago
cindysridykhan / instruct_storyteller_tinyllama2
Training and Fine-tuning an llm in Python and PyTorch.
☆41Updated last year
kroggen / mamba.c
Inference of Mamba models in pure C
☆177Updated 8 months ago
samchaineau / llm_slerp_generation
Repo hosting codes and materials related to speeding LLMs' inference using token merging.
☆29Updated 6 months ago
likejazz / llama3.cuda
llama3.cuda is a pure C/CUDA implementation for Llama 3 model.
☆307Updated 5 months ago
ikawrakow / ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
☆89Updated this week
AlpinDale / QuIP-for-Llama
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models
☆36Updated last year
wejoncy / QLLM
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.
☆148Updated last month
staghado / vit.cpp
Inference Vision Transformer (ViT) in plain C/C++ with ggml
☆229Updated 7 months ago
AmeyaWagh / llama2.cpp
Inference Llama 2 in C++
☆45Updated 6 months ago
abetlen / ggml-python
Python bindings for ggml
☆132Updated 2 months ago
pranavjad / tinyllama-bitnet
Train your own small bitnet model
☆55Updated 3 weeks ago
mlc-ai / llm-perf-bench
☆114Updated 6 months ago
golololologol / LLM-Distillery
A pipeline for LLM knowledge distillation
☆77Updated 3 months ago
BlinkDL / nanoRWKV
RWKV in nanoGPT style
☆177Updated 5 months ago
neuralmagic / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆46Updated this week
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆261Updated last year
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆117Updated last year
wozeparrot / tinyrwkv
tinygrad port of the RWKV large language model.
☆43Updated 4 months ago
RobinQu / instinct.cpp
instinct.cpp provides ready to use alternatives to OpenAI Assistant API and built-in utilities for developing AI Agent applications (RAG,…
☆37Updated 4 months ago
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆250Updated last month
efeslab / fiddler
Fast Inference of MoE Models with CPU-GPU Orchestration
☆170Updated 2 weeks ago
GreenBitAI / green-bit-llm
A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.
☆73Updated 3 weeks ago
chu-tianxiang / QuIP-for-all
QuIP quantization
☆46Updated 7 months ago
LLM360 / amber-data-prep
Data preparation code for Amber 7B LLM
☆82Updated 6 months ago