HanjieChen / Reading-List
☆41Updated last year
Alternatives and similar repositories for Reading-List:
Users that are interested in Reading-List are comparing it to the libraries listed below
- ☆128Updated last year
- ☆47Updated last year
- ☆164Updated 9 months ago
- ☆36Updated last year
- A resource repository for representation engineering in large language models☆117Updated 4 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆71Updated 3 weeks ago
- ☆25Updated last year
- [EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions☆107Updated 6 months ago
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆52Updated 4 months ago
- code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models☆29Updated 4 months ago
- ☆22Updated 6 months ago
- awesome SAE papers☆25Updated last month
- LoFiT: Localized Fine-tuning on LLM Representations☆35Updated 2 months ago
- ☆40Updated 4 months ago
- UnQovering Stereotyping Biases via Underspecified Questions - EMNLP 2020 (Findings)☆22Updated 3 years ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆58Updated last year
- ☆38Updated last year
- The Paper List on Data Contamination for Large Language Models Evaluation.☆90Updated this week
- ☆199Updated last year
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆63Updated 6 months ago
- ☆155Updated 4 months ago
- ☆93Updated last year
- Steering Llama 2 with Contrastive Activation Addition☆134Updated 10 months ago
- ☆14Updated 2 months ago
- ☆173Updated 8 months ago
- ☆29Updated 11 months ago
- AI Logging for Interpretability and Explainability🔬☆110Updated 9 months ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆91Updated last year
- Repo for paper: Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge☆13Updated last year
- [ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models☆80Updated 6 months ago