MinhxLe / subliminal-learningLinks
☆97Updated 2 months ago
Alternatives and similar repositories for subliminal-learning
Users that are interested in subliminal-learning are comparing it to the libraries listed below
Sorting:
- Open source interpretability artefacts for R1.☆161Updated 5 months ago
- ☆142Updated last month
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paper☆129Updated 3 years ago
- Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"☆55Updated 7 months ago
- ☆109Updated 5 months ago
- ☆59Updated 2 weeks ago
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆208Updated this week
- METR Task Standard☆163Updated 8 months ago
- ☆104Updated 2 weeks ago
- Attribution-based Parameter Decomposition☆31Updated 4 months ago
- ☆102Updated 9 months ago
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆55Updated last week
- ☆52Updated last year
- PageRank for LLMs☆50Updated last month
- The history files when recording human interaction while solving ARC tasks☆116Updated last week
- An introduction to LLM Sampling☆79Updated 9 months ago
- A toolkit for describing model features and intervening on those features to steer behavior.☆205Updated 11 months ago
- we got you bro☆36Updated last year
- ☆109Updated 8 months ago
- ☆26Updated 11 months ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆28Updated last year
- Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…☆64Updated last week
- Training-Ready RL Environments + Evals☆121Updated this week
- Measuring the situational awareness of language models☆38Updated last year
- Public repository containing METR's DVC pipeline for eval data analysis☆117Updated 6 months ago
- Functional Benchmarks and the Reasoning Gap☆89Updated last year
- A reading list of relevant papers and projects on foundation model annotation☆28Updated 7 months ago
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆115Updated this week
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Updated 5 months ago
- ☆27Updated 4 months ago