aidecentralized / sonar
SONAR - Self-Organizing Network of Aggregated Representations
☆16Updated this week
Alternatives and similar repositories for sonar
Users that are interested in sonar are comparing it to the libraries listed below
Sorting:
- ☆14Updated last week
- ☆129Updated last month
- Open source interpretability artefacts for R1.☆131Updated 3 weeks ago
- ☆172Updated last month
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆92Updated this week
- Simulation framework for accelerating research in Private Federated Learning☆327Updated this week
- 🧠 Starter templates for doing interpretability research☆70Updated last year
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆102Updated last year
- Verdict is a library for scaling judge-time compute.☆211Updated 2 weeks ago
- A lightweight framework for building research agents designed for developers☆84Updated this week
- ControlArena is a suite of realistic settings, mimicking complex deployment environments, for running control evaluations. This is an alp…☆57Updated this week
- A catalogue of existing Nanda servers☆119Updated 3 weeks ago
- ☆54Updated 7 months ago
- ☆74Updated 3 weeks ago
- ☆22Updated last week
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 2 months ago
- anything you want can be built with morph cloud☆12Updated 2 weeks ago
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paper☆122Updated 2 years ago
- ☆68Updated last year
- METR Task Standard☆146Updated 3 months ago
- A Mechanistic Interpretability Analysis of Grokking☆21Updated 2 years ago
- open source interpretability platform 🧠☆118Updated this week
- ☆78Updated this week
- Mechanistic Interpretability Visualizations using React☆245Updated 5 months ago
- Improving Alignment and Robustness with Circuit Breakers☆203Updated 7 months ago
- ☆148Updated 2 months ago
- DISCO is a code-free and installation-free browser platform that allows any non-technical user to collaboratively train machine learning …☆166Updated this week
- Sphynx Hallucination Induction☆54Updated 3 months ago
- Repository with sample code using Apollo's suggested engineering practices☆9Updated 5 months ago
- A toolkit for describing model features and intervening on those features to steer behavior.☆182Updated 6 months ago