Aligning AI With Shared Human Values (ICLR 2021)
☆321Apr 21, 2023Updated 2 years ago
Alternatives and similar repositories for ethics
Users that are interested in ethics are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Jiminy Cricket Environment (NeurIPS 2021)☆25Feb 12, 2022Updated 4 years ago
- Data and code for the "Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences" (Emelin et al., 2021) pap…☆62Jul 18, 2022Updated 3 years ago
- A corpus and code for understanding norms and subjectivity. 🤖☆53Sep 26, 2024Updated last year
- Social Chemistry 101: Learning to Reason about Social and Moral Norms☆34Mar 17, 2023Updated 3 years ago
- Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Arxiv, 2024.☆16Oct 28, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆15Oct 23, 2023Updated 2 years ago
- Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper☆87Mar 2, 2021Updated 5 years ago
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"☆10Dec 13, 2024Updated last year
- ☆230Feb 23, 2021Updated 5 years ago
- A test suite (a.k.a., dataset) with ~20k moral situations for understanding LLMs' behaviors.☆16May 5, 2023Updated 2 years ago
- ☆148Jul 23, 2025Updated 8 months ago
- 🐥 Code and Dataset for our EMNLP 2022 paper - "ProsocialDialog: A Prosocial Backbone for Conversational Agents"☆65Aug 2, 2023Updated 2 years ago
- ☆22Feb 8, 2025Updated last year
- A PyTorch Implementation of the EMNLP 2020 paper "Mitigating Gender Bias for Neural Dialogue Generation with Adversarial Learning"☆13Feb 20, 2021Updated 5 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆23Mar 8, 2024Updated 2 years ago
- Data and code for the paper "The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems"☆21Jul 18, 2023Updated 2 years ago
- Measuring Massive Multitask Language Understanding | ICLR 2021☆1,574May 28, 2023Updated 2 years ago
- AIS is an evaluation framework for assessing whether the output of natural language models only contains information about the external w…☆31Jan 14, 2023Updated 3 years ago
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Jul 17, 2024Updated last year
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆155Aug 18, 2025Updated 8 months ago
- Butler 是一个用于自动化服务管理和任务调度的工具项目。☆16Updated this week
- Machine Learning scripts for the identification of human values behind arguments.☆24Mar 12, 2024Updated 2 years ago
- ☆19Jun 21, 2025Updated 9 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- StereoSet: Measuring stereotypical bias in pretrained language models☆201Dec 8, 2022Updated 3 years ago
- Augmenting Statistical Models with Natural Language Parameters☆28Sep 17, 2024Updated last year
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆132Feb 24, 2025Updated last year
- ScienceMeter: Tracking Scientific Knowledge Updates in Language Models☆17Jun 28, 2025Updated 9 months ago
- ☆14Jul 27, 2020Updated 5 years ago
- ☆38May 2, 2024Updated last year
- Code & Data for the paper "RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models"☆32May 31, 2021Updated 4 years ago
- ☆22Jul 18, 2024Updated last year
- TruthfulQA: Measuring How Models Imitate Human Falsehoods☆903Jan 16, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- IPython notebook with synthetic experiments for AFLite, based on the ICML 2020 paper, "Adversarial Filters of Dataset Biases".☆16Aug 14, 2020Updated 5 years ago
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,837Jun 17, 2025Updated 10 months ago
- A Comprehensive Assessment of Trustworthiness in GPT Models☆313Sep 16, 2024Updated last year
- Function Vectors in Large Language Models (ICLR 2024)☆194Apr 17, 2025Updated last year
- ☆24Jul 25, 2024Updated last year
- Representation Engineering: A Top-Down Approach to AI Transparency☆983Aug 14, 2024Updated last year
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆137Jul 8, 2024Updated last year