chu-tianxiang / llama-cpp-torch
llama.cpp to PyTorch Converter
☆28Updated 9 months ago
Alternatives and similar repositories for llama-cpp-torch:
Users that are interested in llama-cpp-torch are comparing it to the libraries listed below
- QuIP quantization☆48Updated 10 months ago
- ☆52Updated 7 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆133Updated this week
- 1.58-bit LLaMa model☆80Updated 9 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 3 months ago
- Inference of Mamba models in pure C☆183Updated 11 months ago
- Python bindings for ggml☆136Updated 4 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆190Updated 6 months ago
- QLoRA with Enhanced Multi GPU Support☆36Updated last year
- Pre-training code for CrystalCoder 7B LLM☆55Updated 8 months ago
- RWKV-7: Surpassing GPT☆73Updated 2 months ago
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆212Updated 9 months ago
- ☆49Updated 10 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆113Updated last month
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆36Updated last year
- Low-Rank adapter extraction for fine-tuned transformers models☆167Updated 8 months ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆118Updated last year
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆155Updated last week
- Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget☆140Updated 10 months ago
- GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ☆99Updated last year
- Data preparation code for CrystalCoder 7B LLM☆44Updated 8 months ago
- Data preparation code for Amber 7B LLM☆84Updated 8 months ago
- Testing LLM reasoning abilities with family relationship quizzes.☆57Updated this week
- ☆151Updated 6 months ago
- ☆100Updated last month
- Google TPU optimizations for transformers models☆90Updated last week
- ☆65Updated 8 months ago
- PB-LLM: Partially Binarized Large Language Models☆150Updated last year
- A safetensors extension to efficiently store sparse quantized tensors on disk☆66Updated this week
- tinygrad port of the RWKV large language model.☆44Updated 7 months ago