PAIR-code / pretraining-tdaLinks
β29Updated 8 months ago
Alternatives and similar repositories for pretraining-tda
Users that are interested in pretraining-tda are comparing it to the libraries listed below
Sorting:
- AI Logging for Interpretability and Explainabilityπ¬β130Updated last year
- β133Updated last week
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.β56Updated last year
- β98Updated 2 years ago
- Sparse probing paper full code.β62Updated last year
- β181Updated 11 months ago
- Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the pβ¦β12Updated 9 months ago
- [ICLR 2025] General-purpose activation steering libraryβ114Updated last month
- A library for efficient patching and automatic circuit discovery.β78Updated 3 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models β¦β219Updated last week
- β191Updated 2 weeks ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.β83Updated 7 months ago
- β236Updated last year
- β92Updated last year
- β49Updated 2 years ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methodsβ136Updated 4 months ago