☆161Aug 9, 2022Updated 3 years ago
Alternatives and similar repositories for moderation-api-release
Users that are interested in moderation-api-release are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆123Dec 2, 2024Updated last year
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆135Feb 24, 2025Updated last year
- Chef cookbooks for managing a Ceph cluster☆12Apr 2, 2023Updated 3 years ago
- Fluentd output plugin that sends events to Amazon Kinesis Streams and Amazon Kinesis Firehose.☆13Apr 2, 2023Updated 3 years ago
- Benchmark dataset for the evaluation of scientific article representations on the task of citation recommendation across various scientif…☆12Oct 21, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆127Jun 3, 2026Updated last week
- ☆27Nov 20, 2023Updated 2 years ago
- a python3 compatible pyconfigatron☆10Oct 17, 2016Updated 9 years ago
- BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).☆180Oct 27, 2023Updated 2 years ago
- ☆10Oct 31, 2022Updated 3 years ago
- ☆70Sep 30, 2025Updated 8 months ago
- Tensors and Dynamic neural networks in Python with strong GPU acceleration☆51Oct 4, 2021Updated 4 years ago
- Websockify is a WebSocket to TCP proxy/bridge. This allows a browser to connect to any application/server/service. Implementations in Py…☆29Nov 7, 2016Updated 9 years ago
- Code release for "Understanding Bias in Large-Scale Visual Datasets"☆23Dec 4, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"☆13Dec 13, 2024Updated last year
- ☆38Updated this week
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,837Jun 17, 2025Updated 11 months ago
- ☆19Mar 25, 2024Updated 2 years ago
- ☆134Nov 13, 2023Updated 2 years ago
- Hyperbolic Safety-Aware Vision-Language Models. CVPR 2025☆31Apr 8, 2025Updated last year
- [NeurIPS 2023] Differentially Private Image Classification by Learning Priors from Random Processes☆12Jun 12, 2023Updated 3 years ago
- ☆45Oct 1, 2024Updated last year
- ☆29Mar 20, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal☆976Aug 16, 2024Updated last year
- A fast + lightweight implementation of the GCG algorithm in PyTorch☆336May 13, 2025Updated last year
- ☆45Jun 19, 2025Updated 11 months ago
- This repo contains the code for generating the ToxiGen dataset, published at ACL 2022.☆346Jun 17, 2024Updated last year
- This is the official Gtihub repo for our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Lang…☆22Jul 3, 2024Updated last year
- This repository contains the training and evaluation code for llm-jp-modernbert-base.☆17Jun 17, 2025Updated 11 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆63Aug 30, 2024Updated last year
- Causal Analysis of Agent Behavior for AI Safety☆20Jun 27, 2023Updated 2 years ago
- Lightblue LLM Eval Framework: tengu, elyza100, ja-mtbench, rakuda☆18Apr 29, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]☆21May 2, 2024Updated 2 years ago
- Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!☆359Oct 17, 2025Updated 7 months ago
- ☆60Mar 9, 2023Updated 3 years ago
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)☆105Jan 11, 2026Updated 5 months ago
- A simple evaluation of generative language models and safety classifiers.☆100Apr 15, 2026Updated last month
- An Empirical Study of Memorization in NLP (ACL 2022)☆13Jun 22, 2022Updated 3 years ago
- ☆27Jun 5, 2024Updated 2 years ago