MinhxLe / subliminal-learningLinks
☆100Updated 3 months ago
Alternatives and similar repositories for subliminal-learning
Users that are interested in subliminal-learning are comparing it to the libraries listed below
Sorting:
- ☆142Updated last month
- Functional Benchmarks and the Reasoning Gap☆89Updated last year
- ☆61Updated last month
- Open source interpretability artefacts for R1.☆163Updated 6 months ago
- ☆80Updated 2 weeks ago
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆211Updated this week
- An attribution library for LLMs☆43Updated last year
- Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"☆76Updated 8 months ago
- ☆114Updated 2 weeks ago
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paper☆129Updated 3 years ago
- ☆29Updated 4 months ago
- Draw more samples☆194Updated last year
- Evaluating LLMs with fewer examples☆164Updated last year
- METR Task Standard☆163Updated 8 months ago
- ☆111Updated 8 months ago
- Public repository containing METR's DVC pipeline for eval data analysis☆124Updated 6 months ago
- Training-Ready RL Environments + Evals☆158Updated this week
- ☆104Updated 2 weeks ago
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆118Updated last week
- ☆103Updated 9 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆188Updated 7 months ago
- Materials for ConceptARC paper☆104Updated 11 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆193Updated last year
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆56Updated 3 weeks ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆103Updated 6 months ago
- Sphynx Hallucination Induction☆53Updated 9 months ago
- A toolkit for describing model features and intervening on those features to steer behavior.☆209Updated 11 months ago
- ☆26Updated last year
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆224Updated 10 months ago
- An introduction to LLM Sampling☆79Updated 10 months ago