HanjieChen / Reading-ListLinks
☆46Updated last year
Alternatives and similar repositories for Reading-List
Users that are interested in Reading-List are comparing it to the libraries listed below
Sorting:
- awesome SAE papers☆43Updated 3 months ago
- ☆172Updated last year
- ☆55Updated 2 years ago
- A resource repository for representation engineering in large language models☆131Updated 9 months ago
- ☆146Updated last year
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆265Updated 5 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆76Updated 5 months ago
- ☆187Updated 9 months ago
- ☆36Updated 2 years ago
- code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models☆41Updated 9 months ago
- ☆25Updated 2 months ago
- A curated list of resources for activation engineering☆101Updated 3 months ago
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆63Updated 9 months ago
- A versatile toolkit for applying Logit Lens to modern large language models (LLMs). Currently supports Llama-3.1-8B and Qwen-2.5-7B, enab…☆99Updated 2 weeks ago
- [EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions☆114Updated 11 months ago
- The Paper List on Data Contamination for Large Language Models Evaluation.☆99Updated this week
- ☆29Updated last year
- [NeurIPS 2024] How do Large Language Models Handle Multilingualism?☆39Updated 9 months ago
- A Survey on Data Selection for Language Models☆247Updated 4 months ago
- [ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models☆82Updated 11 months ago
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆144Updated 2 weeks ago
- This repository collects all relevant resources about interpretability in LLMs☆369Updated 10 months ago
- ☆11Updated 6 months ago
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆74Updated 10 months ago
- ☆32Updated 8 months ago
- Materials for EACL2024 tutorial: Transformer-specific Interpretability☆60Updated last year
- ☆81Updated 8 months ago
- [ICLR 2025] General-purpose activation steering library☆95Updated last month
- UnQovering Stereotyping Biases via Underspecified Questions - EMNLP 2020 (Findings)☆22Updated 4 years ago
- Awesome-LLM-Robustness: a curated list of Uncertainty, Reliability and Robustness in Large Language Models☆777Updated 3 months ago