jorahn / llama-int8
Quantized inference code for LLaMA models
☆13Updated 2 years ago
Alternatives and similar repositories for llama-int8:
Users that are interested in llama-int8 are comparing it to the libraries listed below
- Trying to deconstruct RWKV in understandable terms☆14Updated last year
- A library for incremental loading of large PyTorch checkpoints☆56Updated 2 years ago
- Preprint: Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆28Updated last year
- Command-line script for inferencing from models such as LLaMA, in a chat scenario, with LoRA adaptations☆33Updated last year
- A playground to make it easy to try crazy things☆33Updated last week
- ☆26Updated 2 years ago
- ☆40Updated 2 years ago
- A library for simplifying fine tuning with multi gpu setups in the Huggingface ecosystem.☆16Updated 5 months ago
- ☆22Updated 10 months ago
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- Backend for the diffusion-ui frontend☆25Updated last year
- ☆35Updated 2 years ago
- An OpenAI API compatible LLM inference server based on ExLlamaV2.☆25Updated last year
- Experimental sampler to make LLMs more creative☆30Updated last year
- ☆32Updated last year
- The code that runs my blog: https://blog.gpt4.org/☆10Updated 3 years ago
- This repository is about implementing The Personality Cores Conversation System originally developed by Aperture Science, Inc. in the Por…☆25Updated 11 months ago
- Training hybrid models for dummies.☆20Updated 3 months ago
- Simple LLM inference server☆20Updated 10 months ago
- A fork of llama3.c used to do some R&D on inferencing☆20Updated 4 months ago
- ☆27Updated last year
- ☆50Updated last year
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆43Updated last year
- GGML implementation of BERT model with Python bindings and quantization.☆56Updated last year
- ☆28Updated last year
- Rust bindings for CTranslate2☆14Updated last year
- ☆39Updated 2 years ago
- Port of Facebook's LLaMA model in C/C++☆20Updated last year
- Chatbot that answers frequently asked questions in French, English, and Tunisian using the Rasa NLU framework and RWKV-4-Raven☆13Updated last year
- Senna is an advanced AI-powered search engine designed to provide users with immediate answers to their queries by leveraging natural lan…☆19Updated 7 months ago