baixuechunzi / llm-implicit-biasLinks

☆22

Alternatives and similar repositories for llm-implicit-bias

Users that are interested in llm-implicit-bias are comparing it to the libraries listed below

Sorting:

llm-misinformation / llm-misinformation-survey
Paper list for the survey "Combating Misinformation in the Age of LLMs: Opportunities and Challenges" and the initiative "LLMs Meet Misin…
☆103Updated last year
chrisliu298 / awesome-representation-engineering
A resource repository for representation engineering in large language models
☆140Updated last year
princeton-nlp / benign-data-breaks-safety
☆41Updated last year
OAfzal / nlp-for-peer-review
☆49Updated 11 months ago
HenryCai11 / LLM-Self-Control
The official repo of paper "Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller"
☆18Updated last year
balevinstein / Probes
☆57Updated 2 years ago
zepingyu0512 / neuron-attribution
code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models
☆47Updated last year
zepingyu0512 / awesome-SAE
awesome SAE papers
☆59Updated 5 months ago
TrustGen / TrustEval-toolkit
Toolkit for evaluating the trustworthiness of generative foundation models.
☆123Updated 2 months ago
paul-rottger / xstest
Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"
☆116Updated 8 months ago
yuzhaouoe / SAE-based-representation-engineering
[NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
☆67Updated 11 months ago
lyy1994 / awesome-data-contamination
The Paper List on Data Contamination for Large Language Models Evaluation.
☆104Updated 2 months ago
tatsu-lab / opinions_qa
☆116Updated last year
nyu-mll / BBQ
Repository for the Bias Benchmark for QA dataset.
☆129Updated last year
chujiezheng / LLM-Safeguard
Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"
☆99Updated 6 months ago
zhiyuanhubj / UoT
[NeurIPS 2024] Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models
☆103Updated last year
SALT-NLP / chain-of-thought-bias
☆28Updated last year
pillowsofwind / Knowledge-Conflicts-Survey
[EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"
☆145Updated last year
Hunter-DDM / knowledge-neurons
Code for the ACL-2022 paper "Knowledge Neurons in Pretrained Transformers"
☆173Updated last year
tatsu-lab / test_set_contamination
☆41Updated 2 years ago
nrimsky / LM-exp
LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces
☆99Updated 2 years ago
flamewei123 / DEPN
☆25Updated last year
llm-misinformation / llm-misinformation
The dataset and code for the ICLR 2024 paper "Can LLM-Generated Misinformation Be Detected?"
☆78Updated last year
GAIR-NLP / alignment-for-honesty
☆76Updated last year
eric-mitchell / serac
Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model
☆69Updated 3 years ago
genglinliu / UnknownBench
Repo for paper: Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge
☆14Updated last year
JunsolKim / RepresentationPoliticalLLM
Kim, J., Evans, J., & Schein, A. (2025). Linear Representations of Political Perspective Emerge in Large Language Models. ICLR.
☆23Updated 7 months ago
fc2869 / lo-fit
LoFiT: Localized Fine-tuning on LLM Representations
☆43Updated 10 months ago
Thartvigsen / GRACE
[NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors
☆82Updated 11 months ago
YihanWang617 / llm-jailbreaking-defense
A lightweight library for large laguage model (LLM) jailbreaking defense.
☆59Updated 2 months ago