aiverify-foundation / moonshot-dataLinks
Contains all assets to run with Moonshot Library (Connectors, Datasets and Metrics)
☆39Updated last week
Alternatives and similar repositories for moonshot-data
Users that are interested in moonshot-data are comparing it to the libraries listed below
Sorting:
- Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique☆18Updated last year
- Moonshot - A simple and modular tool to evaluate and red-team any LLM application.☆295Updated this week
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆116Updated this week
- Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"☆53Updated last year
- ☆34Updated last year
- Code for the paper "Fishing for Magikarp"☆178Updated 7 months ago
- ☆42Updated last year
- The Granite Guardian models are designed to detect risks in prompts and responses.☆126Updated 3 months ago
- AI Verify☆39Updated this week
- Red-Teaming Language Models with DSPy☆249Updated 10 months ago
- ☆50Updated last year
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆70Updated last year
- A Comprehensive Assessment of Trustworthiness in GPT Models☆311Updated last year
- codebase release for EMNLP2023 paper publication☆19Updated 3 months ago
- 🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data …☆212Updated this week
- Improving Alignment and Robustness with Circuit Breakers☆252Updated last year
- Papers about red teaming LLMs and Multimodal models.☆159Updated 7 months ago
- Adversarial Attacks on GPT-4 via Simple Random Search [Dec 2023]☆43Updated last year
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆157Updated 7 months ago
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆102Updated last year
- autoredteam: code for training models that automatically red team other language models☆15Updated 2 years ago
- ☆297Updated this week
- ☆92Updated 3 weeks ago
- ☆39Updated 2 years ago
- Collection of evals for Inspect AI☆332Updated this week
- Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs☆302Updated last year
- Open Implementations of LLM Analyses☆107Updated last year
- [NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.☆179Updated 9 months ago
- An open-source compliance-centered evaluation framework for Generative AI models☆178Updated 2 weeks ago
- AuditNLG: Auditing Generative AI Language Modeling for Trustworthiness☆101Updated 11 months ago