coldlarry / llama2.cppLinks
Inference Llama 2 in one file of pure C
☆12Updated 2 years ago
Alternatives and similar repositories for llama2.cpp
Users that are interested in llama2.cpp are comparing it to the libraries listed below
Sorting:
- Inference Llama 2 in one file of pure C++☆87Updated 2 years ago
- A chat UI for Llama.cpp☆15Updated 2 months ago
- instinct.cpp provides ready to use alternatives to OpenAI Assistant API and built-in utilities for developing AI Agent applications (RAG,…☆57Updated last year
- FMS Model Optimizer is a framework for developing reduced precision neural network models.☆20Updated last month
- Load and run Llama from safetensors files in C☆15Updated last year
- Inference Llama 2 with a model compiled to native code by TorchInductor☆14Updated last year
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆64Updated 7 months ago
- ☆79Updated last year
- LLM inference in Fortran☆65Updated last year
- ☆71Updated 10 months ago
- Inference of Mamba and Mamba2 models in pure C☆196Updated 2 weeks ago
- ☆13Updated last year
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆51Updated 11 months ago
- Implementation from scratch in C of the Multi-head latent attention used in the Deepseek-v3 technical paper.☆19Updated last year
- LLM training in simple, raw C/CUDA☆112Updated last year
- cortex.llamacpp is a high-efficiency C++ inference engine for edge computing. It is a dynamic library that can be loaded by any server a…☆42Updated 7 months ago
- Port of Facebook's LLaMA model in C/C++☆23Updated last year
- Experiments with BitNet inference on CPU☆55Updated last year
- A flexible serving framework that delivers efficient and fault-tolerant LLM inference for clustered deployments.☆86Updated 2 weeks ago
- Python bindings for ggml☆147Updated last year
- A c++ framework on efficient training & fine-tuning LLMs☆27Updated this week
- minimal C implementation of speculative decoding based on llama2.c☆25Updated last year
- OpenVINO Tokenizers extension☆48Updated last week
- Inference RWKV v7 in pure C.☆44Updated 3 months ago
- GPT2 implementation in C++ using Ort☆26Updated 5 years ago
- Ahead of Time (AOT) Triton Math Library☆88Updated last week
- GGML implementation of BERT model with Python bindings and quantization.☆58Updated last year
- 一个用Apple Metal实现的Llama和通义千问大模型本地推理☆10Updated last year
- ☆70Updated 2 years ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆41Updated 2 years ago