poking-agents / modular-public
☆12Updated this week
Alternatives and similar repositories for modular-public:
Users that are interested in modular-public are comparing it to the libraries listed below
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆74Updated this week
- METR Task Standard☆135Updated 2 weeks ago
- ☆51Updated last week
- ☆10Updated 6 months ago
- ☆79Updated last week
- ☆247Updated 6 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆78Updated last month
- ☆19Updated last year
- ☆48Updated 3 months ago
- Mechanistic Interpretability Visualizations using React☆220Updated last month
- ☆25Updated 9 months ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆26Updated 7 months ago
- Draw more samples☆184Updated 6 months ago
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆82Updated last year
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆154Updated 2 months ago
- Extract full next-token probabilities via language model APIs☆230Updated 10 months ago
- Sphynx Hallucination Induction☆51Updated 5 months ago
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆93Updated 10 months ago
- Improving Steering Vectors by Targeting Sparse Autoencoder Features☆13Updated 2 months ago
- Collection of evals for Inspect AI☆47Updated this week
- Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.☆206Updated 11 months ago
- A toolkit for describing model features and intervening on those features to steer behavior.☆150Updated 2 months ago
- Machine Learning for Alignment Bootcamp☆70Updated 2 years ago
- Code and Data Repo for the CoNLL Paper -- Future Lens: Anticipating Subsequent Tokens from a Single Hidden State☆18Updated last year
- Discovering Data-driven Hypotheses in the Wild☆51Updated 2 months ago
- ☆115Updated this week
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆192Updated last week
- Machine Learning for Alignment Bootcamp (MLAB).☆24Updated 2 years ago
- ☆41Updated this week
- Steering Llama 2 with Contrastive Activation Addition☆114Updated 7 months ago