jameswdelancey / llama3.c

A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct models.

☆48

Related projects ⓘ

Alternatives and complementary repositories for llama3.c

ikawrakow / ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
☆89Updated this week
kroggen / mamba.c
Inference of Mamba models in pure C
☆177Updated 8 months ago
catid / bitnet_cpu
Experiments with BitNet inference on CPU
☆50Updated 7 months ago
gf712 / gpt2-cpp
GPT2 implementation in C++ using Ort
☆24Updated 3 years ago
likejazz / llama3.cuda
llama3.cuda is a pure C/CUDA implementation for Llama 3 model.
☆307Updated 5 months ago
cindysridykhan / instruct_storyteller_tinyllama2
Training and Fine-tuning an llm in Python and PyTorch.
☆41Updated last year
abetlen / ggml-python
Python bindings for ggml
☆132Updated 2 months ago
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆348Updated 2 months ago
trzy / llava-cpp-server
LLaVA server (llama.cpp).
☆177Updated last year
rafacelente / bllama
1.58-bit LLaMa model
☆79Updated 7 months ago
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated 3 weeks ago
staghado / vit.cpp
Inference Vision Transformer (ViT) in plain C/C++ with ggml
☆229Updated 7 months ago
nomic-ai / kompute
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …
☆41Updated last month
pranavjad / tinyllama-bitnet
Train your own small bitnet model
☆55Updated 3 weeks ago
BlinkDL / nanoRWKV
RWKV in nanoGPT style
☆177Updated 5 months ago
rejunity / tiny-asic-1_58bit-matrix-mul
Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit
☆111Updated 6 months ago
OpenGVLab / EfficientQAT
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
☆222Updated last month
salykova / matmul.c
Fast, Multi-threaded Matrix Multiplication in C
☆181Updated 3 weeks ago
gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆86Updated 6 months ago
antirez / gguf-tools
GGUF implementation in C as a library and a tools CLI program
☆242Updated 4 months ago
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆172Updated 3 months ago
xyzhang626 / embeddings.cpp
ggml implementation of embedding models including SentenceTransformer and BGE
☆52Updated 10 months ago
intel / auto-round
Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for t…
☆245Updated this week
perk11 / large-model-proxy
Large Model Proxy is designed to make it easy to run multiple resource-heavy Large Models (LM) on the same machine with limited amount of…
☆46Updated last month
rogerallen / llama2.cu
Inference Llama 2 in one file of pure C & one file with CUDA
☆16Updated last year
nkotak / 1.58BitNet
Experimental BitNet Implementation
☆60Updated 7 months ago
Cornell-RelaxML / qtip
☆61Updated last week
AlpinDale / QuIP-for-Llama
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models
☆36Updated last year
leloykun / llama2.cpp
Inference Llama 2 in one file of pure C++
☆79Updated last year
Cornell-RelaxML / quip-sharp
☆503Updated 2 weeks ago