i-gallegos / Fair-LLM-BenchmarkLinks

☆154

Alternatives and similar repositories for Fair-LLM-Benchmark

Users that are interested in Fair-LLM-Benchmark are comparing it to the libraries listed below

Sorting:

McGill-NLP / bias-bench
ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.
☆149Updated 2 months ago
amazon-science / bold
Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper
☆81Updated 4 years ago
nyu-mll / BBQ
Repository for the Bias Benchmark for QA dataset.
☆129Updated last year
SALT-NLP / Efficient_Unlearning
☆38Updated 2 years ago
princeton-nlp / corpus-poisoning
[EMNLP 2023] Poisoning Retrieval Corpora by Injecting Adversarial Passages https://arxiv.org/abs/2310.19156
☆40Updated last year
joeljang / knowledge-unlearning
[ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models
☆83Updated last year
balevinstein / Probes
☆56Updated 2 years ago
nyu-mll / crows-pairs
This repository contains the data and code introduced in the paper "CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Maske…
☆125Updated last year
lorenzkuhn / semantic_uncertainty
☆179Updated last year
chrisliu298 / awesome-representation-engineering
A resource repository for representation engineering in large language models
☆138Updated 11 months ago
SALT-NLP / chain-of-thought-bias
☆28Updated last year
kevinyaobytedance / llm_unlearn
LLM Unlearning
☆176Updated 2 years ago
vinid / safety-tuned-llamas
ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.
☆87Updated last year
martiansideofthemoon / ai-detection-paraphrases
Official repository for our NeurIPS 2023 paper "Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense…
☆176Updated last year
CharlesYu2000 / PCGU-UnlearningBias
☆17Updated last year
llm-misinformation / llm-misinformation
The dataset and code for the ICLR 2024 paper "Can LLM-Generated Misinformation Be Detected?"
☆77Updated 11 months ago
EmpathYang / ADEPT
Source code and data for ADEPT: A DEbiasing PrompT Framework (AAAI-23).
☆15Updated 10 months ago
declare-lab / red-instruct
Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
☆105Updated last year
llm-misinformation / llm-misinformation-survey
Paper list for the survey "Combating Misinformation in the Age of LLMs: Opportunities and Challenges" and the initiative "LLMs Meet Misin…
☆103Updated 11 months ago
umanlp / RedditBias
Code & Data for the paper "RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models"
☆31Updated 4 years ago
myracheng / markedpersonas
Code and data for Marked Personas (ACL 2023)
☆28Updated 2 years ago
paul-rottger / xstest
Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"
☆116Updated 8 months ago
snw2021 / LLM_Unlearning_Papers
☆26Updated last year
ajyl / dpo_toxic
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.
☆82Updated 7 months ago
D2I-ai / eigenscore
☆37Updated 10 months ago
HanjieChen / Reading-List
☆52Updated last year
chujiezheng / LLM-Safeguard
Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"
☆97Updated 5 months ago
tatsu-lab / opinions_qa
☆116Updated last year
facebookresearch / ResponsibleNLP
Repository for research in the field of Responsible NLP at Meta.
☆202Updated 5 months ago
shmsw25 / FActScore
A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…
☆394Updated 6 months ago