chu-tianxiang / llama-cpp-torch
llama.cpp to PyTorch Converter
☆33Updated last year
Alternatives and similar repositories for llama-cpp-torch:
Users that are interested in llama-cpp-torch are comparing it to the libraries listed below
- QuIP quantization☆51Updated last year
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆86Updated this week
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆82Updated last month
- Python bindings for ggml☆140Updated 7 months ago
- Evaluating LLMs with Dynamic Data☆82Updated 2 months ago
- 1.58-bit LLaMa model☆81Updated last year
- RWKV-7: Surpassing GPT☆83Updated 5 months ago
- Image Diffusion block merging technique applied to transformers based Language Models.☆54Updated last year
- Low-Rank adapter extraction for fine-tuned transformers models☆171Updated 11 months ago
- Inference of Mamba models in pure C☆187Updated last year
- Port of Facebook's LLaMA model in C/C++☆20Updated last year
- Google TPU optimizations for transformers models☆108Updated 3 months ago
- inference code for mixtral-8x7b-32kseqlen☆99Updated last year
- Experiments on speculative sampling with Llama models☆125Updated last year
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆198Updated 9 months ago
- ☆49Updated last year
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆274Updated last year
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆59Updated 3 months ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆119Updated last year
- Multipack distributed sampler for fast padding-free training of LLMs☆188Updated 8 months ago
- Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models☆69Updated last year
- GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ☆103Updated last year
- ☆50Updated 5 months ago
- ☆46Updated 9 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆59Updated 6 months ago
- RWKV in nanoGPT style☆189Updated 10 months ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- Code for data-aware compression of DeepSeek models☆20Updated 2 weeks ago
- ☆126Updated last month
- A safetensors extension to efficiently store sparse quantized tensors on disk☆100Updated last week