Erasing concepts from neural representations with provable guarantees
☆254Jan 27, 2025Updated last year
Alternatives and similar repositories for concept-erasure
Users that are interested in concept-erasure are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆39Jul 14, 2022Updated 3 years ago
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆220Jun 8, 2026Updated last week
- ☆12Oct 23, 2022Updated 3 years ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆31May 23, 2024Updated 2 years ago
- Tools for understanding how transformer predictions are built layer-by-layer☆594Aug 7, 2025Updated 10 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Algebraic value editing in pretrained language models☆70Nov 1, 2023Updated 2 years ago
- Understanding how features learned by neural networks evolve throughout training☆41Oct 24, 2024Updated last year
- ☆284Mar 2, 2024Updated 2 years ago
- ☆56Oct 23, 2023Updated 2 years ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆260Jun 8, 2026Updated last week
- Mechanistic Interpretability Visualizations using React☆351Apr 30, 2026Updated last month