tangledgroup / llama-cpp-python-exploit
llama-cpp-python-exploit
☆15Updated last year
Alternatives and similar repositories for llama-cpp-python-exploit:
Users that are interested in llama-cpp-python-exploit are comparing it to the libraries listed below
- langchain-prompt-exploit☆14Updated last year
- pandasai-sandbox-exploit☆13Updated last year
- Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads☆2,503Updated 10 months ago
- Fast & Simple repository for pre-training and fine-tuning T5-style models☆1,001Updated 8 months ago
- Fine-tune mistral-7B on 3090s, a100s, h100s☆710Updated last year
- Python bindings for the Transformer models implemented in C/C++ using GGML library.☆1,859Updated last year
- The repository for the code of the UltraFastBERT paper☆518Updated last year
- Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)☆567Updated last year
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,817Updated last year
- Official implementation of Half-Quadratic Quantization (HQQ)☆791Updated this week
- 🤖 A PyTorch library of curated Transformer models and their composable components☆884Updated last year
- Tune any FALCON in 4-bit☆466Updated last year
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,242Updated last month
- Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…☆345Updated 8 months ago
- ☆531Updated 5 months ago
- Port of Facebook's LLaMA model in C/C++☆11Updated this week
- ☆412Updated last year
- Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript☆576Updated 9 months ago
- Accelerate your Hugging Face Transformers 7.6-9x. Native to Hugging Face and PyTorch.☆683Updated 8 months ago
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆692Updated last year
- fastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backe…☆410Updated last year
- Inference code for Persimmon-8B☆415Updated last year
- Inference Llama 2 in one file of pure Python☆415Updated 6 months ago
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI☆1,378Updated last year
- Training LLMs with QLoRA + FSDP☆1,472Updated 5 months ago
- YaRN: Efficient Context Window Extension of Large Language Models☆1,470Updated last year
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization☆686Updated 8 months ago
- Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"☆1,060Updated last year
- Serving multiple LoRA finetuned LLM as one☆1,054Updated 11 months ago
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,104Updated 2 weeks ago