ajobi-uhc / seerLinks
This was designed for interp researchers who want to do research on or with interp agents to give quality of life improvements and fix some of the annoying things you get from only using Claude code out of the box
☆101Updated 3 weeks ago
Alternatives and similar repositories for seer
Users that are interested in seer are comparing it to the libraries listed below
Sorting:
- Unified access to Large Language Model modules using NNsight☆71Updated last week
- Mechanistic Interpretability Visualizations using React☆307Updated last year
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆235Updated 2 weeks ago
- Inference API for many LLMs and other useful tools for empirical research☆91Updated 2 weeks ago
- Open source interpretability artefacts for R1.☆165Updated 8 months ago
- ☆58Updated last year
- ☆83Updated 10 months ago
- ☆193Updated last year
- A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.☆49Updated this week
- ☆262Updated last year
- Open source replication of Anthropic's Crosscoders for Model Diffing☆63Updated last year
- ControlArena is a collection of settings, model organisms and protocols - for running control experiments.☆145Updated 3 weeks ago
- ☆132Updated 2 years ago
- Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.☆236Updated 5 months ago
- ☆77Updated 3 weeks ago
- Attribution-based Parameter Decomposition☆33Updated 7 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆236Updated last year
- ☆380Updated 4 months ago
- Course Materials for Interpretability of Large Language Models (0368.4264) at Tel Aviv University☆279Updated 3 weeks ago
- Sparse Autoencoder for Mechanistic Interpretability☆285Updated last year
- ☆227Updated last year
- A toolkit for describing model features and intervening on those features to steer behavior.☆225Updated 3 weeks ago
- Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.☆31Updated 8 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆137Updated 10 months ago
- ☆202Updated 2 months ago
- ☆20Updated 9 months ago
- Sparse Autoencoder Training Library☆56Updated 8 months ago
- This repository collects all relevant resources about interpretability in LLMs☆389Updated last year
- ☆82Updated 3 months ago
- The nnsight package enables interpreting and manipulating the internals of deep learned models.☆758Updated this week