☆161Aug 9, 2022Updated 3 years ago
Alternatives and similar repositories for moderation-api-release
Users that are interested in moderation-api-release are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆127Dec 2, 2024Updated last year
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆137Feb 24, 2025Updated last year
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆127Updated this week
- ☆27Nov 20, 2023Updated 2 years ago
- BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).☆180Oct 27, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆10Oct 31, 2022Updated 3 years ago
- ☆70Sep 30, 2025Updated 9 months ago
- Tensors and Dynamic neural networks in Python with strong GPU acceleration☆51Oct 4, 2021Updated 4 years ago
- [ICLR 2025] Code implementation of R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning☆23Jul 8, 2024Updated last year
- Code release for "Understanding Bias in Large-Scale Visual Datasets"☆24Dec 4, 2024Updated last year
- ☆38Updated this week
- Repository for public code and data associated with the paper "Fake News on Twitter During the 2016 U.S. Presidential Election☆12Dec 5, 2019Updated 6 years ago
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,845Jun 17, 2025Updated last year
- ☆19Mar 25, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆11Apr 13, 2023Updated 3 years ago
- ☆134Nov 13, 2023Updated 2 years ago
- Hyperbolic Safety-Aware Vision-Language Models. CVPR 2025☆31Apr 8, 2025Updated last year
- [NeurIPS 2023] Differentially Private Image Classification by Learning Priors from Random Processes☆12Jun 12, 2023Updated 3 years ago
- We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…☆356Feb 23, 2024Updated 2 years ago
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal☆991Aug 16, 2024Updated last year
- A fast + lightweight implementation of the GCG algorithm in PyTorch