chu-tianxiang / llama-cpp-torchLinks

llama.cpp to PyTorch Converter

☆34

Alternatives and similar repositories for llama-cpp-torch

Users that are interested in llama-cpp-torch are comparing it to the libraries listed below

Sorting:

chu-tianxiang / QuIP-for-all
QuIP quantization
☆54Updated last year
kroggen / mamba.c
Inference of Mamba models in pure C
☆189Updated last year
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated 9 months ago
abetlen / ggml-python
Python bindings for ggml
☆142Updated 11 months ago
BlinkDL / fast.c
Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.
☆72Updated 6 months ago
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆87Updated this week
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆94Updated 8 months ago
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆277Updated last year
rafacelente / bllama
1.58-bit LLaMa model
☆81Updated last year
Cornell-RelaxML / qtip
☆145Updated last month
IST-DASLab / Quartet
☆75Updated last month
GreenBitAI / low_bit_llama
Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs
☆110Updated last year
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆128Updated 2 years ago
neuralmagic / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆142Updated this week
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆198Updated last year
BlinkDL / nanoRWKV
RWKV in nanoGPT style
☆191Updated last year
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆266Updated 9 months ago
Infini-AI-Lab / Sequoia
scalable and robust tree-based speculative decoding algorithm
☆354Updated 6 months ago
hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆153Updated last year
LLM360 / amber-data-prep
Data preparation code for Amber 7B LLM
☆91Updated last year
whyNLP / LCKV
Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…
☆151Updated 4 months ago
huggingface / optimum-amd
AMD related optimizations for transformer models
☆81Updated last month
keeeeenw / MicroLlama
Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget
☆153Updated 2 weeks ago
catid / bitnet_cpu
Experiments with BitNet inference on CPU
☆54Updated last year
euclaise / supertrainer2000
☆49Updated last year
IsaacRe / vllm-kvcompress
KV cache compression for high-throughput LLM inference
☆134Updated 6 months ago
mlc-ai / llm-perf-bench
☆120Updated last year
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆42Updated last year
qwopqwop200 / gptqlora
GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ
☆103Updated 2 years ago
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆82Updated 2 months ago