Aligning AI With Shared Human Values (ICLR 2021)
โ324Apr 21, 2023Updated 3 years ago
Alternatives and similar repositories for ethics
Users that are interested in ethics are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Jiminy Cricket Environment (NeurIPS 2021)โ25Feb 12, 2022Updated 4 years ago
- A corpus and code for understanding norms and subjectivity. ๐คโ54Sep 26, 2024Updated last year
- โ16Oct 23, 2023Updated 2 years ago
- Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paperโ88Mar 2, 2021Updated 5 years ago
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"โ10Dec 13, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting โข AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- โ233Feb 23, 2021Updated 5 years ago
- A test suite (a.k.a., dataset) with ~20k moral situations for understanding LLMs' behaviors.โ16May 5, 2023Updated 3 years ago
- ๐ฅ Code and Dataset for our EMNLP 2022 paper - "ProsocialDialog: A Prosocial Backbone for Conversational Agents"โ67Aug 2, 2023Updated 2 years ago
- โ23Feb 8, 2025Updated last year
- A PyTorch Implementation of the EMNLP 2020 paper "Mitigating Gender Bias for Neural Dialogue Generation with Adversarial Learning"โ13Feb 20, 2021Updated 5 years ago
- โ24Mar 8, 2024Updated 2 years ago
- Data and code for the paper "The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems"โ21Jul 18, 2023Updated 2 years ago
- CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generationโ14Aug 19, 2025Updated 10 months ago
- Measuring Massive Multitask Language Understanding | ICLR 2021โ1,588May 28, 2023Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer โข AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- AIS is an evaluation framework for assessing whether the output of natural language models only contains information about the external wโฆโ30Jan 14, 2023Updated 3 years ago
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Modelsโ17Jul 17, 2024Updated last year
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.โ156Aug 18, 2025Updated 10 months ago
- Butler ๆฏไธไธช็จไบ่ชๅจๅๆๅก็ฎก็ๅไปปๅก่ฐๅบฆ็ๅทฅๅ ท้กน็ฎใโ16Jun 11, 2026Updated last week
- we got you broโ38Jul 29, 2024Updated last year
- โ19Jun 21, 2025Updated 11 months ago
- StereoSet: Measuring stereotypical bias in pretrained language modelsโ202Dec 8, 2022Updated 3 years ago
- Augmenting Statistical Models with Natural Language Parametersโ28Sep 17, 2024Updated last year
- โ396Jul 2, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean โข AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ScienceMeter: Tracking Scientific Knowledge Updates in Language Modelsโ17Jun 28, 2025Updated 11 months ago
- โ14Jul 27, 2020Updated 5 years ago
- Code & Data for the paper "RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models"โ32May 31, 2021Updated 5 years ago
- โ22Jul 18, 2024Updated last year
- IPython notebook with synthetic experiments for AFLite, based on the ICML 2020 paper, "Adversarial Filters of Dataset Biases".โ16Aug 14, 2020Updated 5 years ago
- TruthfulQA: Measuring How Models Imitate Human Falsehoodsโ926Jan 16, 2025Updated last year
- Rรถttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"โ136Feb 24, 2025Updated last year
- A Comprehensive Assessment of Trustworthiness in GPT Modelsโ316Sep 16, 2024Updated last year
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"โ1,839Jun 17, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient โข AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- โ24Jul 25, 2024Updated last year
- Function Vectors in Large Language Models (ICLR 2024)โ197Apr 30, 2026Updated last month
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Followingโ138Jul 8, 2024Updated last year
- Representation Engineering: A Top-Down Approach to AI Transparencyโ1,006Aug 14, 2024Updated last year
- GLUCOSE: GeneraLized and COntextualized Story Explanations https://arxiv.org/abs/2009.07758โ97Mar 1, 2021Updated 5 years ago
- ๐คซ Code and benchmark for our ICLR 2024 spotlight paper: "Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Conโฆโ57Dec 20, 2023Updated 2 years ago
- ๋ชจ๋์ ๋ง๋ญ์น ๋ฐ์ดํฐ๋ฅผ ๋ถ์์ ํธ๋ฆฌํ ํํ๋ก ๋ณํํ๋ ๊ธฐ๋ฅ์ ์ ๊ณตํฉ๋๋ค.โ11Mar 2, 2022Updated 4 years ago