jorahn / llama-int8
Quantized inference code for LLaMA models
☆13Updated last year
Related projects ⓘ
Alternatives and complementary repositories for llama-int8
- A library for incremental loading of large PyTorch checkpoints☆56Updated last year
- ☆27Updated last year
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- Preprint: Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆28Updated 9 months ago
- ☆26Updated last year
- ☆21Updated 5 months ago
- Trying to deconstruct RWKV in understandable terms☆14Updated last year
- GGML implementation of BERT model with Python bindings and quantization.☆51Updated 9 months ago
- ☆31Updated 10 months ago
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆42Updated 8 months ago
- Embedding models from Jina AI☆56Updated 10 months ago
- assign color hues to a collection of text fragments based on embeddings☆20Updated 5 months ago
- Karras et al. (2022) diffusion models for PyTorch☆19Updated 5 months ago
- Command-line script for inferencing from models such as falcon-7b-instruct☆75Updated last year
- Modified Stanford-Alpaca Trainer for Training Replit's Code Model☆40Updated last year
- Image Generation API Server - Similar to https://text-generator.io but for images☆47Updated 2 months ago
- RWKV-7: Surpassing GPT☆45Updated this week
- Pressure testing the context window of open LLMs☆22Updated 2 months ago
- [WIP] Transformer to embed Danbooru labelsets☆13Updated 7 months ago
- Low-Rank Adaptation of Large Language Models clean implementation☆9Updated last year
- Command-line script for inferencing from models such as MPT-7B-Chat☆102Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆66Updated last year
- Command-line script for inferencing from models such as LLaMA, in a chat scenario, with LoRA adaptations☆33Updated last year
- An OpenAI API compatible LLM inference server based on ExLlamaV2.☆22Updated 9 months ago
- Just a simple HowTo for https://github.com/johnsmith0031/alpaca_lora_4bit☆31Updated last year
- ☆34Updated last year
- Accepts a Hugging Face model URL, automatically downloads and quantizes it using Bits and Bytes.☆38Updated 8 months ago
- ☆19Updated last year