jameswdelancey / llama3.c
A faithful clone of Karpathy's llama2.c (one file inference, zero dependency) but fully functional with LLaMA 3 8B base and instruct models.
☆48Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for llama3.c
- llama.cpp fork with additional SOTA quants and improved performance☆89Updated this week
- Inference of Mamba models in pure C☆177Updated 8 months ago
- Experiments with BitNet inference on CPU☆50Updated 7 months ago
- GPT2 implementation in C++ using Ort☆24Updated 3 years ago
- llama3.cuda is a pure C/CUDA implementation for Llama 3 model.☆307Updated 5 months ago
- Training and Fine-tuning an llm in Python and PyTorch.☆41Updated last year
- Python bindings for ggml☆132Updated 2 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆348Updated 2 months ago
- LLaVA server (llama.cpp).☆177Updated last year
- 1.58-bit LLaMa model☆79Updated 7 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 3 weeks ago
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆229Updated 7 months ago
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆41Updated last month
- Train your own small bitnet model☆55Updated 3 weeks ago
- RWKV in nanoGPT style☆177Updated 5 months ago
- Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit☆111Updated 6 months ago
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆222Updated last month
- Fast, Multi-threaded Matrix Multiplication in C☆181Updated 3 weeks ago
- LLM training in simple, raw C/CUDA☆86Updated 6 months ago
- GGUF implementation in C as a library and a tools CLI program☆242Updated 4 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆172Updated 3 months ago
- ggml implementation of embedding models including SentenceTransformer and BGE☆52Updated 10 months ago
- Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for t…☆245Updated this week
- Large Model Proxy is designed to make it easy to run multiple resource-heavy Large Models (LM) on the same machine with limited amount of…☆46Updated last month
- Inference Llama 2 in one file of pure C & one file with CUDA☆16Updated last year
- Experimental BitNet Implementation☆60Updated 7 months ago
- ☆61Updated last week
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆36Updated last year
- Inference Llama 2 in one file of pure C++☆79Updated last year
- ☆503Updated 2 weeks ago