UKGovernmentBEIS / hibayesLinks
☆29Updated this week
Alternatives and similar repositories for hibayes
Users that are interested in hibayes are comparing it to the libraries listed below
Sorting:
- Collection of evals for Inspect AI☆205Updated this week
- METR Task Standard☆157Updated 6 months ago
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆103Updated this week
- ControlArena is a collection of settings, model organisms and protocols - for running control experiments.☆82Updated last week
- ☆99Updated 4 months ago
- Inference-time scaling for LLMs-as-a-judge.☆272Updated last month
- A toolkit for describing model features and intervening on those features to steer behavior.☆196Updated 9 months ago
- ☆25Updated 2 months ago
- ☆95Updated 3 months ago
- Inspect: A framework for large language model evaluations☆1,238Updated this week
- ☆113Updated this week
- Open source interpretability artefacts for R1.☆157Updated 3 months ago
- ☆136Updated 4 months ago
- ☆289Updated last year
- ☆98Updated last week
- Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.☆221Updated last year
- Mechanistic Interpretability Visualizations using React☆277Updated 7 months ago
- A library for making RepE control vectors☆624Updated 7 months ago
- ☆56Updated 2 weeks ago
- ☆182Updated 5 months ago
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆209Updated this week
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆113Updated last year
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆205Updated this week
- Sparsify transformers with SAEs and transcoders☆604Updated this week
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆238Updated last week
- open source interpretability platform 🧠☆324Updated this week
- An attribution library for LLMs☆42Updated 10 months ago
- Draw more samples☆193Updated last year
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆582Updated this week
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆176Updated 5 months ago