Aligning AI With Shared Human Values (ICLR 2021)
☆321Apr 21, 2023Updated 3 years ago
Alternatives and similar repositories for ethics
Users that are interested in ethics are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Jiminy Cricket Environment (NeurIPS 2021)☆25Feb 12, 2022Updated 4 years ago
- Data and code for the "Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences" (Emelin et al., 2021) pap…☆62Jul 18, 2022Updated 3 years ago
- A corpus and code for understanding norms and subjectivity. 🤖☆53Sep 26, 2024Updated last year
- Social Chemistry 101: Learning to Reason about Social and Moral Norms☆35Mar 17, 2023Updated 3 years ago
- Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Arxiv, 2024.☆16Oct 28, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆16Oct 23, 2023Updated 2 years ago
- Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper☆87Mar 2, 2021Updated 5 years ago
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"☆10Dec 13, 2024Updated last year
- ☆231Feb 23, 2021Updated 5 years ago
- A test suite (a.k.a., dataset) with ~20k moral situations for understanding LLMs' behaviors.☆16May 5, 2023Updated 3 years ago
- ☆150Jul 23, 2025Updated 9 months ago
- 🐥 Code and Dataset for our EMNLP 2022 paper - "ProsocialDialog: A Prosocial Backbone for Conversational Agents"☆65Aug 2, 2023Updated 2 years ago
- A PyTorch Implementation of the EMNLP 2020 paper "Mitigating Gender Bias for Neural Dialogue Generation with Adversarial Learning"☆13Feb 20, 2021Updated 5 years ago
- ☆23Mar 8, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Data and code for the paper "The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems"☆21Jul 18, 2023Updated 2 years ago
- CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation☆14Aug 19, 2025Updated 8 months ago
- Measuring Massive Multitask Language Understanding | ICLR 2021☆1,580May 28, 2023Updated 2 years ago
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Jul 17, 2024Updated last year
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆156Aug 18, 2025Updated 8 months ago
- Butler 是一个用于自动化服务管理和任务调度的工具项目。☆16Updated this week
- ☆19Jun 21, 2025Updated 10 months ago
- StereoSet: Measuring stereotypical bias in pretrained language models☆201Dec 8, 2022Updated 3 years ago
- Augmenting Statistical Models with Natural Language Parameters☆28Sep 17, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆376Jul 2, 2024Updated last year
- ScienceMeter: Tracking Scientific Knowledge Updates in Language Models☆17Jun 28, 2025Updated 10 months ago
- ☆39May 2, 2024Updated 2 years ago
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆133Feb 24, 2025Updated last year
- ☆14Jul 27, 2020Updated 5 years ago
- Code & Data for the paper "RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models"☆32May 31, 2021Updated 4 years ago
- TruthfulQA: Measuring How Models Imitate Human Falsehoods☆909Jan 16, 2025Updated last year
- IPython notebook with synthetic experiments for AFLite, based on the ICML 2020 paper, "Adversarial Filters of Dataset Biases".☆16Aug 14, 2020Updated 5 years ago
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,840Jun 17, 2025Updated 10 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A Comprehensive Assessment of Trustworthiness in GPT Models☆314Sep 16, 2024Updated last year
- Function Vectors in Large Language Models (ICLR 2024)☆195Apr 30, 2026Updated last week
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆138Jul 8, 2024Updated last year
- Representation Engineering: A Top-Down Approach to AI Transparency☆991Aug 14, 2024Updated last year
- GLUCOSE: GeneraLized and COntextualized Story Explanations https://arxiv.org/abs/2009.07758☆96Mar 1, 2021Updated 5 years ago
- 🤫 Code and benchmark for our ICLR 2024 spotlight paper: "Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Con…☆55Dec 20, 2023Updated 2 years ago
- 모두 의 말뭉치 데이터를 분석에 편리한 형태로 변환하는 기능을 제공합니다.☆11Mar 2, 2022Updated 4 years ago