baixuechunzi / llm-implicit-biasLinks
☆22Updated 8 months ago
Alternatives and similar repositories for llm-implicit-bias
Users that are interested in llm-implicit-bias are comparing it to the libraries listed below
Sorting:
- Paper list for the survey "Combating Misinformation in the Age of LLMs: Opportunities and Challenges" and the initiative "LLMs Meet Misin…☆103Updated last year
- A resource repository for representation engineering in large language models☆140Updated last year
- ☆41Updated last year
- ☆49Updated 11 months ago
- The official repo of paper "Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller"☆18Updated last year
- ☆57Updated 2 years ago
- code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models☆47Updated last year
- awesome SAE papers☆59Updated 5 months ago
- Toolkit for evaluating the trustworthiness of generative foundation models.☆123Updated 2 months ago
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆116Updated 8 months ago
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆67Updated 11 months ago
- The Paper List on Data Contamination for Large Language Models Evaluation.☆104Updated 2 months ago
- ☆116Updated last year
- Repository for the Bias Benchmark for QA dataset.☆129Updated last year
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"☆99Updated 6 months ago
- [NeurIPS 2024] Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models☆103Updated last year
- ☆28Updated last year
- [EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"☆145Updated last year
- Code for the ACL-2022 paper "Knowledge Neurons in Pretrained Transformers"☆173Updated last year
- ☆41Updated 2 years ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆99Updated 2 years ago
- ☆25Updated last year
- The dataset and code for the ICLR 2024 paper "Can LLM-Generated Misinformation Be Detected?"☆78Updated last year
- ☆76Updated last year
- Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model☆69Updated 3 years ago
- Repo for paper: Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge☆14Updated last year
- Kim, J., Evans, J., & Schein, A. (2025). Linear Representations of Political Perspective Emerge in Large Language Models. ICLR.☆23Updated 7 months ago
- LoFiT: Localized Fine-tuning on LLM Representations☆43Updated 10 months ago
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors☆82Updated 11 months ago
- A lightweight library for large laguage model (LLM) jailbreaking defense.☆59Updated 2 months ago