chu-tianxiang / llama-cpp-torch
llama.cpp to PyTorch Converter
☆33Updated last year
Alternatives and similar repositories for llama-cpp-torch
Users that are interested in llama-cpp-torch are comparing it to the libraries listed below
Sorting:
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆86Updated this week
- Python bindings for ggml☆140Updated 8 months ago
- 1.58-bit LLaMa model☆81Updated last year
- QuIP quantization☆52Updated last year
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆72Updated 3 months ago
- QLoRA with Enhanced Multi GPU Support☆37Updated last year
- Experiments on speculative sampling with Llama models☆126Updated last year
- an implementation of Self-Extend, to expand the context window via grouped attention☆119Updated last year
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆84Updated 2 months ago
- Simple high-throughput inference library☆46Updated this week
- ☆49Updated last year
- Experiments with BitNet inference on CPU☆55Updated last year
- Port of Facebook's LLaMA model in C/C++☆20Updated last year
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 7 months ago
- RWKV-7: Surpassing GPT☆84Updated 6 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆149Updated last month
- A safetensors extension to efficiently store sparse quantized tensors on disk☆112Updated this week
- ☆119Updated last year
- ☆131Updated last month
- Repository for CPU Kernel Generation for LLM Inference☆26Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters