vgel / repeng
A library for making RepE control vectors
☆560Updated 2 months ago
Alternatives and similar repositories for repeng:
Users that are interested in repeng are comparing it to the libraries listed below
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆229Updated last month
- Stop messing around with finicky sampling parameters and just use DRµGS!☆347Updated 9 months ago
- Sparsify transformers with SAEs and transcoders☆494Updated this week
- ☆412Updated last year
- Visualize the intermediate output of Mistral 7B☆344Updated 2 months ago
- Neural Search☆352Updated last week
- utilities for decoding deep representations (like sentence embeddings) back to text☆777Updated last month
- Training Sparse Autoencoders on Language Models☆669Updated this week
- Erasing concepts from neural representations with provable guarantees☆226Updated last month
- A benchmark to evaluate language models on questions I've previously asked them to solve.☆990Updated last month
- A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and full…☆601Updated 3 months ago
- Simple Python library/structure to ablate features in LLMs which are supported by TransformerLens☆437Updated 9 months ago
- This is our own implementation of 'Layer Selective Rank Reduction'☆233Updated 9 months ago
- ☆273Updated last month
- Fast & more realistic evaluation of chat language models. Includes leaderboard.☆185Updated last year
- Toolkit for attaching, training, saving and loading of new heads for transformer models☆265Updated 2 weeks ago
- Generate Synthetic Data Using OpenAI, MistralAI or AnthropicAI☆224Updated 10 months ago
- Representation Engineering: A Top-Down Approach to AI Transparency☆807Updated 7 months ago
- Low-Rank adapter extraction for fine-tuned transformers models☆171Updated 10 months ago
- Inspect: A framework for large language model evaluations☆827Updated this week
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆423Updated 5 months ago
- ☆501Updated 4 months ago
- Mass-editing thousands of facts into a transformer memory (ICLR 2023)☆470Updated last year
- Exact structure out of any language model completion.☆507Updated last year
- Automatically evaluate your LLMs in Google Colab☆603Updated 10 months ago
- ☆445Updated 11 months ago
- ☆512Updated 7 months ago
- Customizable implementation of the self-instruct paper.☆1,040Updated last year
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆192Updated 5 months ago
- batched loras☆340Updated last year