stanfordnlp / pyvene
Stanford NLP Python library for understanding and improving PyTorch models via interventions
☆693Updated this week
Alternatives and similar repositories for pyvene:
Users that are interested in pyvene are comparing it to the libraries listed below
- Training Sparse Autoencoders on Language Models☆614Updated this week
- Sparsify transformers with SAEs and transcoders☆458Updated this week
- The nnsight package enables interpreting and manipulating the internals of deep learned models.☆488Updated this week
- Sparse Autoencoder for Mechanistic Interpretability☆215Updated 6 months ago
- Using sparse coding to find distributed representations used by neural networks.☆213Updated last year
- ☆418Updated 6 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆182Updated 2 months ago
- Representation Engineering: A Top-Down Approach to AI Transparency☆787Updated 6 months ago
- ☆235Updated this week
- Mechanistic Interpretability Visualizations using React☆227Updated last month
- Tools for understanding how transformer predictions are built layer-by-layer☆472Updated 8 months ago
- Locating and editing factual associations in GPT (NeurIPS 2022)☆605Updated 9 months ago
- A library for mechanistic interpretability of GPT-style language models☆1,853Updated this week
- This repository collects all relevant resources about interpretability in LLMs☆316Updated 3 months ago
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model☆496Updated 2 weeks ago
- Interpretability for sequence generation models 🐛 🔍☆398Updated 3 months ago
- Mass-editing thousands of facts into a transformer memory (ICLR 2023)☆462Updated last year
- ☆203Updated 4 months ago
- ☆189Updated 11 months ago
- ☆262Updated 11 months ago
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).☆798Updated this week
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models☆492Updated 7 months ago
- ☆109Updated 6 months ago
- ☆149Updated this week
- Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"☆462Updated 3 weeks ago
- Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.☆209Updated last year
- Extract full next-token probabilities via language model APIs☆228Updated 11 months ago
- A library for making RepE control vectors☆549Updated last month
- utilities for decoding deep representations (like sentence embeddings) back to text☆761Updated 3 weeks ago
- RewardBench: the first evaluation tool for reward models.☆503Updated this week