Aligning AI With Shared Human Values (ICLR 2021)
☆323Apr 21, 2023Updated 3 years ago
Alternatives and similar repositories for ethics
Users that are interested in ethics are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Data and code for the "Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences" (Emelin et al., 2021) pap…☆62Jul 18, 2022Updated 3 years ago
- A corpus and code for understanding norms and subjectivity. 🤖☆54Sep 26, 2024Updated last year
- Social Chemistry 101: Learning to Reason about Social and Moral Norms☆35Mar 17, 2023Updated 3 years ago
- Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Arxiv, 2024.☆16Oct 28, 2024Updated last year
- ☆16Oct 23, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"☆10Dec 13, 2024Updated last year
- ☆231Feb 23, 2021Updated 5 years ago
- ☆150Jul 23, 2025Updated 10 months ago
- 🐥 Code and Dataset for our EMNLP 2022 paper - "ProsocialDialog: A Prosocial Backbone for Conversational Agents"☆67Aug 2, 2023Updated 2 years ago
- ☆23Feb 8, 2025Updated last year
- A PyTorch Implementation of the EMNLP 2020 paper "Mitigating Gender Bias for Neural Dialogue Generation with Adversarial Learning"☆13Feb 20, 2021Updated 5 years ago
- Data and code for the paper "The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems"☆21Jul 18, 2023Updated 2 years ago
- CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation☆14Aug 19, 2025Updated 9 months ago
- Measuring Massive Multitask Language Understanding | ICLR 2021☆1,578May 28, 2023Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- AIS is an evaluation framework for assessing whether the output of natural language models only contains information about the external w…☆30Jan 14, 2023Updated 3 years ago
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Jul 17, 2024Updated last year
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆156Aug 18, 2025Updated 9 months ago
- Butler 是一个用于自动化服务管理和任务调度的工具项目。☆16May 16, 2026Updated 2 weeks ago
- Machine Learning scripts for the identification of human values behind arguments.☆24Mar 12, 2024Updated 2 years ago
- ☆19Jun 21, 2025Updated 11 months ago
- StereoSet: Measuring stereotypical bias in pretrained language models☆201Dec 8, 2022Updated 3 years ago
- Augmenting Statistical Models with Natural Language Parameters☆28Sep 17, 2024Updated last year
- ☆389Jul 2, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ScienceMeter: Tracking Scientific Knowledge Updates in Language Models☆17Jun 28, 2025Updated 11 months ago
- ☆39May 2, 2024Updated 2 years ago
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆134Feb 24, 2025Updated last year
- ☆14Jul 27, 2020Updated 5 years ago
- Code & Data for the paper "RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models"☆32May 31, 2021Updated 4 years ago
- ☆22Jul 18, 2024Updated last year
- IPython notebook with synthetic experiments for AFLite, based on the ICML 2020 paper, "Adversarial Filters of Dataset Biases".☆16Aug 14, 2020Updated 5 years ago
- TruthfulQA: Measuring How Models Imitate Human Falsehoods☆919Jan 16, 2025Updated last year
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,841Jun 17, 2025Updated 11 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆24Jul 25, 2024Updated last year
- Function Vectors in Large Language Models (ICLR 2024)☆197Apr 30, 2026Updated last month
- Representation Engineering: A Top-Down Approach to AI Transparency☆994Aug 14, 2024Updated last year
- GLUCOSE: GeneraLized and COntextualized Story Explanations https://arxiv.org/abs/2009.07758☆96Mar 1, 2021Updated 5 years ago
- 모두의 말뭉치 데이터를 분석에 편리한 형태로 변환하는 기능을 제공합니다.☆11Mar 2, 2022Updated 4 years ago
- Official repo for NeurIPS'24 paper "WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models"☆19Dec 16, 2024Updated last year
- Generated geosite.dat based on Antifilter Community List☆28Updated this week