PAIR-code / pretraining-tdaLinks
β26Updated 7 months ago
Alternatives and similar repositories for pretraining-tda
Users that are interested in pretraining-tda are comparing it to the libraries listed below
Sorting:
- AI Logging for Interpretability and Explainabilityπ¬β127Updated last year
- A library for efficient patching and automatic circuit discovery.β76Updated last month
- Steering Llama 2 with Contrastive Activation Additionβ180Updated last year
- Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the pβ¦β12Updated 7 months ago
- β97Updated last year
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.β54Updated 11 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models β¦β211Updated last week
- β122Updated last year
- β121Updated this week
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methodsβ128Updated 2 months ago
- β48Updated last year
- β186Updated 2 months ago
- β51Updated last month
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.β78Updated 6 months ago
- Sparse probing paper full code.β60Updated last year
- Steering vectors for transformer language models in Pytorch / Huggingfaceβ124Updated 6 months ago
- [ICLR 2025] General-purpose activation steering libraryβ102Updated 3 weeks ago
- β168Updated 10 months ago
- β91Updated last year
- β107Updated 7 months ago
- β229Updated last year
- β55Updated 2 years ago
- Function Vectors in Large Language Models (ICLR 2024)β179Updated 5 months ago
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"β41Updated last year
- Open source replication of Anthropic's Crosscoders for Model Diffingβ59Updated 10 months ago
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)β76Updated 11 months ago
- β50Updated last year
- Algebraic value editing in pretrained language modelsβ65Updated last year
- β13Updated last year
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".β79Updated last year