openai / moderation-api-release
☆120Updated 2 years ago
Alternatives and similar repositories for moderation-api-release:
Users that are interested in moderation-api-release are comparing it to the libraries listed below
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆108Updated last year
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆73Updated this week
- Röttger et al. (2023): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆77Updated last year
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆56Updated last month
- ☆193Updated 3 years ago
- Inspecting and Editing Knowledge Representations in Language Models☆111Updated last year
- Scripts for generating synthetic finetuning data for reducing sycophancy.☆108Updated last year
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆66Updated 10 months ago
- A simple evaluation of generative language models and safety classifiers.☆36Updated 5 months ago
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆200Updated 2 months ago
- ☆51Updated last year
- ☆100Updated 8 months ago
- ☆177Updated last year
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated 11 months ago
- The GitHub repo for Goal Driven Discovery of Distributional Differences via Language Descriptions☆68Updated last year
- PASTA: Post-hoc Attention Steering for LLMs☆109Updated last month
- This is the repo for the paper Shepherd -- A Critic for Language Model Generation☆217Updated last year
- This project studies the performance and robustness of language models and task-adaptation methods.☆142Updated 8 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆124Updated 10 months ago
- This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Aji…☆215Updated last year
- Code and model release for the paper "Task-aware Retrieval with Instructions" by Asai et al.☆162Updated last year
- ☆39Updated 5 months ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆85Updated last year
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆92Updated 8 months ago
- Improving Alignment and Robustness with Circuit Breakers☆174Updated 3 months ago
- Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks☆207Updated last year
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆88Updated 10 months ago
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆77Updated 5 months ago
- Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper☆70Updated 3 years ago
- This repo contains the code for generating the ToxiGen dataset, published at ACL 2022.☆291Updated 7 months ago