openai / moderation-api-releaseLinks

☆142

Alternatives and similar repositories for moderation-api-release

Users that are interested in moderation-api-release are comparing it to the libraries listed below

Sorting:

allenai / real-toxicity-prompts
☆220Updated 4 years ago
anthropics / ConstitutionalHarmlessnessPaper
☆241Updated 2 years ago
microsoft / TOXIGEN
This repo contains the code for generating the ToxiGen dataset, published at ACL 2022.
☆331Updated last year
tatsu-lab / opinions_qa
☆115Updated last year
allenai / wildguard
Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
☆91Updated 10 months ago
AI-secure / DecodingTrust
A Comprehensive Assessment of Trustworthiness in GPT Models
☆303Updated last year
swj0419 / detect-pretrain-code
This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Aji…
☆233Updated last year
facebookresearch / ResponsibleNLP
Repository for research in the field of Responsible NLP at Meta.
☆202Updated 4 months ago
mlcommons / modelbench
Run safety benchmarks against AI models and view detailed reports showing how well they performed.
☆105Updated this week
neelsjain / BYOD
The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"
☆107Updated 2 years ago
evandez / REMEDI
Inspecting and Editing Knowledge Representations in Language Models
☆116Updated 2 years ago
paul-rottger / xstest
Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"
☆116Updated 7 months ago
nyu-mll / BBQ
Repository for the Bias Benchmark for QA dataset.
☆128Updated last year
google-research / true
Code and data accompanying the paper "TRUE: Re-evaluating Factual Consistency Evaluation".
☆81Updated 2 months ago
declare-lab / red-instruct
Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
☆105Updated last year
tomekkorbak / pretraining-with-human-feedback
Code accompanying the paper Pretraining Language Models with Human Preferences
☆180Updated last year
facebookresearch / Shepherd
This is the repo for the paper Shepherd -- A Critic for Language Model Generation
☆217Updated 2 years ago
google-research / lm-extraction-benchmark
☆293Updated 2 months ago
allenai / wimbd
What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets
☆224Updated 10 months ago
asahi417 / lmppl
Calculate perplexity on a text with pre-trained language models. Support MLM (eg. DeBERTa), recurrent LM (eg. GPT3), and encoder-decoder …
☆162Updated 3 months ago
mingkaid / rl-prompt
Accompanying repo for the RLPrompt paper
☆355Updated last year
centerforaisafety / wmdp
WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…
☆145Updated 4 months ago
amazon-science / bold
Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper
☆80Updated 4 years ago
google / sycophancy-intervention
Scripts for generating synthetic finetuning data for reducing sycophancy.
☆116Updated 2 years ago
ryoungj / ToolEmu
[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use
☆165Updated last year
Libr-AI / do-not-answer
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
☆292Updated last year
tianjunz / HIR
☆159Updated 2 years ago
hendrycks / ethics
Aligning AI With Shared Human Values (ICLR 2021)
☆299Updated 2 years ago
martiansideofthemoon / ai-detection-paraphrases
Official repository for our NeurIPS 2023 paper "Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense…
☆175Updated last year
realtimeqa / realtimeqa_public
☆78Updated last year