coldlarry / llama2.cppLinks
Inference Llama 2 in one file of pure C
☆10Updated last year
Alternatives and similar repositories for llama2.cpp
Users that are interested in llama2.cpp are comparing it to the libraries listed below
Sorting:
- Load and run Llama from safetensors files in C☆12Updated 8 months ago
- A chat UI for Llama.cpp☆15Updated last week
- Inference Llama 2 in one file of pure C++☆83Updated last year
- Course Project for COMP4471 on RWKV☆17Updated last year
- minimal C implementation of speculative decoding based on llama2.c☆24Updated last year
- FMS Model Optimizer is a framework for developing reduced precision neural network models.☆20Updated this week
- ☆75Updated 7 months ago
- instinct.cpp provides ready to use alternatives to OpenAI Assistant API and built-in utilities for developing AI Agent applications (RAG,…☆52Updated last year
- Experiments with BitNet inference on CPU☆54Updated last year
- Explore training for quantized models☆20Updated this week
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆30Updated this week
- LLM training in simple, raw C/CUDA☆99Updated last year
- JAX bindings for the flash-attention3 kernels☆11Updated 11 months ago
- RWKV-LM-V7(https://github.com/BlinkDL/RWKV-LM) Under Lightning Framework☆35Updated last week
- Inference RWKV with multiple supported backends.☆51Updated last week
- A C++ implementation of tinyllama inference on CPU.☆10Updated last year
- ggml implementation of embedding models including SentenceTransformer and BGE☆58Updated last year
- Port of Facebook's LLaMA model in C/C++☆22Updated last year
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆75Updated last week
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆51Updated 4 months ago
- ☆74Updated 3 months ago
- LLM inference in Fortran☆59Updated last year
- GGUF parser in Python☆28Updated 11 months ago
- Fast and memory-efficient exact attention☆177Updated this week
- QuIP quantization☆54Updated last year
- Unit Scaling demo and experimentation code☆16Updated last year
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆38Updated last year
- RWKV centralised docs for the community☆27Updated last week
- Inference Llama 2 with a model compiled to native code by TorchInductor☆14Updated last year
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆174Updated 3 months ago