baixuechunzi / llm-implicit-biasLinks
☆22Updated 9 months ago
Alternatives and similar repositories for llm-implicit-bias
Users that are interested in llm-implicit-bias are comparing it to the libraries listed below
Sorting:
- ☆50Updated last year
- The Paper List on Data Contamination for Large Language Models Evaluation.☆107Updated 3 weeks ago
- ☆57Updated 2 years ago
- code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models☆48Updated last year
- Code for the paper "A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis"☆19Updated 6 months ago
- awesome SAE papers☆66Updated 6 months ago
- Repository for the Bias Benchmark for QA dataset.☆133Updated last year
- Paper list for the survey "Combating Misinformation in the Age of LLMs: Opportunities and Challenges" and the initiative "LLMs Meet Misin…☆104Updated last year
- ☆116Updated last year
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆67Updated last year
- Recent papers on (1) Psychology of LLMs; (2) Biases in LLMs.☆50Updated 2 years ago
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).☆57Updated last year
- ☆76Updated last year
- [ACL'2024 Findings] "Understanding and Patching Compositional Reasoning in LLMs"☆13Updated last year
- The official repo of paper "Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller"☆18Updated last year
- The Prism Alignment Project☆86Updated last year
- Personality Alignment of Language Models☆51Updated 5 months ago
- Code for "From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Mod…☆40Updated last year
- Safety-J: Evaluating Safety with Critique☆16Updated last year
- [NeurIPS 2024] Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models☆105Updated last year
- FeatureAlignment = Alignment + Mechanistic Interpretability☆33Updated 9 months ago
- Code for the ACL-2022 paper "Knowledge Neurons in Pretrained Transformers"☆173Updated last year
- ☆28Updated last year
- ☆27Updated 2 years ago
- A resource repository for representation engineering in large language models☆143Updated last year
- Paper list for the paper "Authorship Attribution in the Era of Large Language Models: Problems, Methodologies, and Challenges (SIGKDD Exp…☆18Updated 11 months ago
- Code and Results of the Paper: On the Reliability of Psychological Scales on Large Language Models☆30Updated last year
- LLM Unlearning☆178Updated 2 years ago
- Repo for paper: Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge☆14Updated last year
- ☆156Updated 2 years ago