Aligning AI With Shared Human Values (ICLR 2021)
☆316Apr 21, 2023Updated 2 years ago
Alternatives and similar repositories for ethics
Users that are interested in ethics are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Jiminy Cricket Environment (NeurIPS 2021)☆25Feb 12, 2022Updated 4 years ago
- Data and code for the "Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences" (Emelin et al., 2021) pap…☆63Jul 18, 2022Updated 3 years ago
- A corpus and code for understanding norms and subjectivity. 🤖☆53Sep 26, 2024Updated last year
- Social Chemistry 101: Learning to Reason about Social and Moral Norms☆34Mar 17, 2023Updated 3 years ago
- Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Arxiv, 2024.☆16Oct 28, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ☆15Oct 23, 2023Updated 2 years ago
- Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper☆87Mar 2, 2021Updated 5 years ago
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"☆10Dec 13, 2024Updated last year
- ☆230Feb 23, 2021Updated 5 years ago
- 🐥 Code and Dataset for our EMNLP 2022 paper - "ProsocialDialog: A Prosocial Backbone for Conversational Agents"☆65Aug 2, 2023Updated 2 years ago
- ☆21Feb 8, 2025Updated last year
- A PyTorch Implementation of the EMNLP 2020 paper "Mitigating Gender Bias for Neural Dialogue Generation with Adversarial Learning"☆13Feb 20, 2021Updated 5 years ago
- ☆23Mar 8, 2024Updated 2 years ago
- Data and code for the paper "The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems"☆21Jul 18, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation☆14Aug 19, 2025Updated 7 months ago
- Measuring Massive Multitask Language Understanding | ICLR 2021☆1,569May 28, 2023Updated 2 years ago
- AIS is an evaluation framework for assessing whether the output of natural language models only contains information about the external w…☆32Jan 14, 2023Updated 3 years ago
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Jul 17, 2024Updated last year
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆154Aug 18, 2025Updated 7 months ago
- Butler 是一个用于自动化服务管理和任务调度的工具项目。☆16Updated this week
- Machine Learning scripts for the identification of human values behind arguments.☆24Mar 12, 2024Updated 2 years ago
- ☆19Jun 21, 2025Updated 9 months ago
- StereoSet: Measuring stereotypical bias in pretrained language models☆200Dec 8, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Augmenting Statistical Models with Natural Language Parameters☆28Sep 17, 2024Updated last year
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆131Feb 24, 2025Updated last year
- ScienceMeter: Tracking Scientific Knowledge Updates in Language Models☆17Jun 28, 2025Updated 9 months ago
- ☆14Jul 27, 2020Updated 5 years ago
- ☆38May 2, 2024Updated last year
- Code & Data for the paper "RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models"☆32May 31, 2021Updated 4 years ago
- TruthfulQA: Measuring How Models Imitate Human Falsehoods☆896Jan 16, 2025Updated last year
- ☆22Jul 18, 2024Updated last year
- IPython notebook with synthetic experiments for AFLite, based on the ICML 2020 paper, "Adversarial Filters of Dataset Biases".☆16Aug 14, 2020Updated 5 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,833Jun 17, 2025Updated 9 months ago
- A Comprehensive Assessment of Trustworthiness in GPT Models☆315Sep 16, 2024Updated last year
- Function Vectors in Large Language Models (ICLR 2024)☆195Apr 17, 2025Updated 11 months ago
- ☆24Jul 25, 2024Updated last year
- Representation Engineering: A Top-Down Approach to AI Transparency☆969Aug 14, 2024Updated last year
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆137Jul 8, 2024Updated last year
- GLUCOSE: GeneraLized and COntextualized Story Explanations https://arxiv.org/abs/2009.07758☆96Mar 1, 2021Updated 5 years ago